If Auto-Captions Told Me
but for some other reasonance to exact resemblance
it's an issue of academic Integrity[1][2]
way New York I love you but you're
bringing me down[3] with all these empty
words
shut and open so do Queens[4] shutters
itself completely in my head A Perfect
Image The Apparition of these faces in
the crowd petals on a wet black
bow[5] we don't remember Stein because she
wrote like
[Music]
this
buffalo buffalo buffalo buffalo buffalo
had had had had Hadad[6]
like AI wrs but gerud Stein[7] was a
[Music]
person
but for some other
reasonance[8] to exact resemblance the
exact resemblance as exact as a[9]
resemblance
The Human Experience
the nuances of Consciousness
the bulletin board Outside
The Faculty bathroom
the
plagiarism Spectrum
that
we have rules and procedures and uation[10]
marks and conventions that govern what
is appropriate
the
kpone the gray
album
the guy drinking Ocean Spray to
dreams[11] ma
glocky if I told him is Rec competence
(recombinance)
recombinant
(recombinance)
a
combination
(recombination)
when done by humans we call this
[Music]
thinking
when I say understand I mean that I can
(When I say "understand", I mean that I can)[12]
recognize patterns analyze language
(recognize patterns, analyze language,)
and respond in ways that are coherent
(and respond in ways that are coherent)
and contextually appropriate based on my
(and contextually appropriate based on my)
training I don't understand in the Deep
(training.
I don't understand in the deep)
conscious sense of the word like when
(conscious sense of the word--like when)
more[13] two weeks ago
you can put a
b Paper in and get A+ writing out
writing the thing that writing is is a
thing that AI cannot do
[Music]
listen what rating[14]
is telepathy of course
look
no mythy mountain [ __ ] real telepathy[15]
because it sounds sounds good obviously
beautifully they all
responded exactly like the elves[16]
you don't know these
people but I do we all carry around
more more
than a dozen conversations that start
out of nowhere
that I am
not out of my gourd
fails the touring test
flaccid fasile flaps of phrases[17] that
held nothing they're not wrong
one slay Moore
obviously it is a
recombination of lay Miz
its more
essential Source text
invoke curtains for zusa[18]
the baffling and sometimes impuscat nature
asked toid a jenzi existential crisis
someone who you could plausibly
mock
the
inscrutability of Youth
MOG baby grank the AI way[19]
the Magnificent couplet
these zenial[20]
where they were
irregular
there is no there
[Music]
there inscrutable
anything to be found by Plumbing
does that address what you're getting
at
gabrieli at all on fmis of
semantic memory processing
from Aaron reich's[21] bright-sided
forced this is of course ass backwards
shaped like
skillful Pros
pickle
bres Tut your tuba and the horn section
of humanity
nche in 1882
eching through generations has less on
societ and the forces that shape Our
Lives
for
guica
no basat renberg twam their
brother chat GPT
flumix by the request
a triv request
work for free
authentic complex baffling humans
collections of experiences and
consciousnesses the same stimulus
the singular Marvel[22]
rather than a sentence
[Music]
it's a space that holds warmth mystery
between Clarity and confusion
a feeling a sort of atmospheric spark
words without needing them to explain
toward expressing things that don't
something else we're both leaning toward
for meaning in what isn't immediately
[Music]
clear
[23]
pops explosives
socks I have been caught[24] in the whls and
edes of if I told him's
language bouncing from Sigma to gach[25]
my attempts to Raft those rivers
writing the real mythy mountain [ __ ]
but I pop my PIV
because I speak by forcing air
air that carries my intent
GPT pops explosives because it is
programmed to there is no air there
the other simply can't cannot be
there is no mind to meet
w
[Music]
The text of this poem is almost entirely from the auto-transcript from josh (with parentheses)'s video "You are a better writer than AI. (Yes, you.)". I didn't change its punctuation or capitalization. I kept its caption-like line breaks intact, adding an arbitrary amount of spaces when I left words out. I guess it's like digital blackout poetry? Perhaps "paste-in" poetry.
Things in brackets like [Music] or [ __ ] are the auto-captions. Things in parentheses are mine, or, at least, arranged there by me. All of these footnotes are mine, in conversation with the auto-captions, as I try to correct them into more proper, human-readable captions.
The title is a recombination of "If I Told Him" by Gertrude Stein, which features prominently in the video. ↩︎In the style of a YouTube timestamps comment:
00:29 it's an issue of academic Integrity
05:13 but for some other / reasonance
09:25 when I say understand I mean that I can
11:15 listen what rating / is telepathy of course
14:27 they all / responded exactly like the elves
18:23 invoke curtains for zusa
22:07 where they were irregular
28:29 shaped like / skillful Pros
35:20 rather than a sentence
37:23 pops explosives
39:27 w / [Music]
↩︎The auto-captioner better displays the intent, how Josh() is completing the lyric with his own words, to speak with the line as a whole and extra. But captioning best practices -- which differentiate speakers and, if possible, source music -- would have to separate the connection. ↩︎
I guess because it's so close to a mention of New York, it assumes we mean the NYC borough. The auto-captions can't context-switch the way humans do. ↩︎
Yeah, auto-captioner, I feel you here. Sometimes picking the right homophone is hard. But this one is on-screen -- not that you could possibly know that, like I can as human. It's here both as a screenshot of the short poem in Poetry Foundation, but also as the YouTube handle on the Gertrude Stein recitation video. The latter connection is serendipity, it turns out. ↩︎
Correct number of buffalo's. Almost correct number of had's. I had to slow the audio down to 0.5x to count while listening for the correct number of had's -- 6, which my brain would love to correct to 7, so it finishes a phrase as displayed in the on-screen text. But the auto-captioner is right in one thing -- only 6x 'ad-s. ↩︎
Imagine if this mangling was the whole thing. That's a lot of what the auto-captions reading experience is like, especially on videos with specific terminologies (from science to D&D live plays) and especially in non-English languages. PSA: Edit your auto-captions, please! ↩︎
The auto-captioner blends Josh()'s 'reason' right into Stein's 'resemblance'. ↩︎
The auto-captioner diligently displays what Gertrude Stein actually says, in that recording, despite the Poetry Foundation text displaying this [Stein's added words in brackets]: "to exact resemblance the exact resemblance as exact [as a] resemblance". ↩︎
🤣 yeah who needs punctuation anyway ↩︎
My human ears do not hear the 's' in what Josh() says. I think he says it, but it gets lost as the song lyric takes over. The song is, technically, "Dreams", with an 's'. I don't know if the auto-captioner hears the 's' or assumes it. It also knows to capitalize the 't' in YouTube, so I wouldn't be surprised if somehow it knows the correct titles of songs. ↩︎
ChatGPT is reading aloud what it generated, and the text is displayed on-screen, so I've added its displayed text parenthetically. The auto-captioner doesn't know what's on screen, and thus rewrites the text.
An algorithm that considers only spoken language, to translate into written
-- in textual counterpoint with --
an algorithm that typically reads written language to write back.
This is about the point where I decided to figure out how to properly caption for the video. ↩︎Josh() doesn't actually say this 'more'; the sentence properly starts "Two weeks ago". I think the auto-captioner guesses the 'word' that is 'spoken' by the musical chord just before Josh() starts talking.
But it's amusing to find this 'more' here, just before 'two weeks ago', as Josh() confesses in the video description box that "two weeks ago" is really more like "four months ago". It's both wrong and not wrong. ↩︎The 'listen' appears well after Josh() says it, and then, of course, the audiobook narrator correctly reads Stephen King's chapter heading "What Writing Is" and first sentence ("Telepathy, of course."), but the auto-captioner mangles 'writing' into 'rating'. Right after talking about B and A+ papers. Amazing. ↩︎
Real telepathy, like real captioning, doesn't swear.
oops I mean:
because it looks looks good obviously. ↩︎I'm really fascinated by how words-within-words will trip up auto-captions when combined with the wide variety of human accents. To my human ears, Josh() clearly says "themselves". ↩︎
I was really hoping after "fasile"/facile that it would catch the alliteration and, on purpose, misspell phrases/"frases". But alas, it cannot intend; it can only do its best to decode. ↩︎
I just want the reader to know that this millennial still had to go to Urban Dictionary on all of these terms to confirm answers to questions like "is rizzler always capitalized?" ChatGPT swears by "Grimace Shake" having two capitalized words, but Urban Dictionary often only capitalized Grimace. 🙃
During this section of correcting the transcript, I temporarily lost my ability to accurately guess probable spellings to look words up. It took me about four different tries to realize that impuscat was 'obfuscatory'. ↩︎omg it's the header image line 👋😊 Did you know Ghost's ALT text can only be 191 characters? Bluesky has spoiled me. ↩︎
Consider the line above ("the Magnificent couplet"), which, in the video, is referring to a true couple of lines, but meditate instead on a couplet of words.
These: separating the us from them, instead of pulling toward connection.
Zenial: Zennial, but also, perhaps, denial. ↩︎I still struggle to convey to others how truly English the digital world largely is. From Unicode struggling to codify script-based languages like Arabic, to simple audio recognition of names -- later on, Nietzsche gets rendered simply "nche". Hell, it gets "semantic memory processing" correct but not "et al.", one of the more frequent Latin abbreviations used in English.
In the Poetry Unbound podcast, there's an episode where host Padraig Ó Tuama invites the poet Jake Skeets on to read and talk about his poem "Daybreak", and during the episode, Skeets talks of how the native languages were taken from their communities to be replaced with English. Ó Tuama says, "Yeah, that’s a real taking of a soul. It’s an old technology of colonization, is to take the soul of a people, as well as the technologies of practicalities. But the soul-robbing of it in the first instance is an extraordinary violence." Skeets says, "That’s a nice way of putting it. You’re absolutely right. I feel like, not only that, but in addition, English as a language has always been transactional, rooted in an idea of capital. And so not only does it take away something inherent, like soul, it also reintroduces something manufactured, like product."
I can't stop thinking of that conversation, every time I attempt to translate something from another language I know to my first language of English (US). I don't think all English is like this; of course, I think dearly of my first language's literature. But with AI, writing is a product, and we must only produce it faster and cheaper. When I translate into or write in English, I do my best to take care of my language, to preserve the sense of person'ing going on, to not simply make it into a product.
The auto-captions mangles Ehrenreich, the person's name, but it gets Bright-sided, the product, correct in all but capitalization. ↩︎Again, why would it be the simple word?, instead of the brand, in our deeply corporate digital world. Except, if I leave it here in a poem line, the capital Marvel becomes almost Dickinsonian, reapproaching its great depths of simply-a-word meaning. ↩︎
Ahem, we leaned toward something here. The four of us, that is: Josh() to prompt ChatGPT and make this video, ChatGPT to answer Josh(), the auto-captions to mistype and arbitrarily choose line breaks, and me to recombine it here. But it took at least four of us, half of us (arguably most of us, if we include the training data) still from the horn section of humanity. Not much "improvement" by AI on any singular human poet. ↩︎
Every time I proofread this post, I accidentally say this bit like "socks have I been", as if 'socks' is the new 'Zounds!'. ↩︎
Like 'baby grank', it never once spelled gyatt correctly. ↩︎