Wednesday, February 22, 2023

A.I. Smackdown — English Major vs. ChatGPT - Part 2


In my last post, I considered the writing prowess of ChatGPT, the A.I. text generation platform powered by OpenAI’s GPT-3 engine. A professor quoted in Vice magazine said GPT-3 could get a B or B- on an MBA final exam, which I figured had to be an exaggeration. So, I put ChatGPT through its paces, having it write paragraphs in the style of a scholastic essay, a magazine article, and a blog post. In this essay, I’ll tackle a final writing category: poetry. At least when it comes to very logical matters such as rhyme and meter, A.I. ought to do really well … right? Well, let’s see how it does. (Hint: very poorly. And, I suppose I owe you a trigger warning: ChatGPT seems to think violence to one’s genitals is funny.)

But before I get to all that, I will address a pressing question: who cares about any of this? And why should we? I’ll also address a few follow-up questions from the friend who prompted my last post.

Who cares? And why should we?

In response to my previous post, my software maven friend suggested that A.I. could be used effectively for composition if the user (i.e., the person who’s tasked ChatGPT with responding to a query) has enough expertise to evaluate the response and wisely choose what to use from it, and what to discard. In this way, my friend suggests, “this version [of ChatGPT] could make someone who knows what they’re doing more productive.” He continued, “I wonder if for your next blog you might consider how you might use it? Would you be willing to use it as a first draft in responding to a low performing colleague who asks trivial questions?”

I have two responses to this: a practical one and an ideological one. On the practical side, I can’t imagine starting a work email, proposal, or report with ChatGPT because in my experience so far, most of what the A.I. does is apply window dressing and rhetorical flourishes to the ideas I feed it, along with vague assertions that aren’t backed up (e.g., “[Dura-Ace’s] sleek and understated style has been well-received by riders and industry experts alike”). ChatGPT builds repetitive, junior-high-grade essays that fill out the page but don’t add much value to the original prompt. If I were to obtain my rough drafts from ChatGPT, I would have to prune most of the text to end up with something reasonably concise. It would be faster just to write my own missive from scratch.

I realize this may not be true for everyone, and I’ll grant that I have developed, through decades of practice, uncommon facility with writing (having composed over 1.5 million words for albertnet alone). Nevertheless, the ability to quickly draft a work email or brief report is, I believe, a capability that any adult ought to have, just like being able to fry an egg, drive a car, or brew a good cup of coffee. To my mind, increased efficiency should be a matter of personal development, not outsourcing.

The ideological matter is more complicated. If we decide that producing a work document is the kind of hassle that should be dispatched with as little time and energy as possible, like submitting an expense report or making travel arrangements, we are diminishing the assumed value of that activity. As we prepare the next generation for the workforce, this sense of diminishment would trickle down to our schools. We would be sending students a message that writing is a job for A.I. and that the higher-value human thought lies elsewhere.

Having majored in English in college, I naturally bristle at this idea. I believe that reading and writing, more than so perhaps any other endeavor, teach us how to think. I’ve blogged before (e.g., here) about how strongly I disagree with American society’s obsession with STEM, as opposed to the traditional liberal arts that are all but dismissed in modern education. To those who promote STEM, I’d like to ask, what would you think about discontinuing most math classes in school, since we have calculators and spreadsheets to do that crap for us? Of course you wouldn’t support this, and neither do I. (I took a Calculus class in college just for the hell of it.) Studying math is good for your brain, even though most of the specific math skills you learn will never be used. Studying the craft of writing is also good for your brain, and using words well is a skill we can use every day of our professional and personal lives. Writing is hard, and takes time, sure. But when we strive to write well, we understand better, and we think more deeply.

Here is an interesting quotation from the American philosopher Alasdair Macintyre, quoted in the New Yorker, describing his misgivings about the Enlightenment:

It becomes impossible to settle moral questions or to enforce moral rules; the best we can do is agree to disagree. Such a world falls into the hands of managers and technocrats, who excel at the perfection of means but lack the tools with which to think deeply about ends.
I don’t have strong opinions about the Enlightenment, and MacIntyre was writing in the ‘80s, but I think that last sentence fits modern tech companies, particularly in the realm of A.I. development, rather well. What we see in tech is a lot of people asking “how.” How do we monetize Internet traffic, how do we create algorithms that increase this traffic, how do we increase platform dwell time, how do we make people forward more stuff, how do we make social media products more addictive. I don’t see a lot of people asking “why” or, crucially, “why not.” If we outsource writing to A.I., how will people learn to think more deeply and broadly, a capacity we may need now more than ever?

Okay, on to my main topic now. As you shall see, ChatGPT’s attempts at poetry show, in a particularly stark way, just how totally overrated this platform is today.

Shakespearean sonnet

When poetry is good, it can be transporting (think Shakespeare, Dickinson, and Eminem). But when it’s bad, it can be appalling (think Vogon poetry, the feared method of torture described in The Hitchhiker’s Guide to the Galaxy). Obviously this is largely a subjective matter, so in my analysis I’ll look mainly at what can be objectively measured: proper execution of meter and rhyme. This is highly logical stuff, almost mathematical, so it ought to be right up A.I.’s alley. I started with a classic form: the Shakespearean sonnet.

Now, in case you’re not familiar with that poetic form, A.I. is here to help. At least, it’s supposed to be. I asked ChatGPT, “What is a Shakespearean sonnet?” It replied:

A Shakespearean sonnet … consists of 14 lines and follows a specific rhyme scheme and meter... The structure of a Shakespearean sonnet is as follows:
  • Three quatrains (four-line stanzas) with a rhyme scheme of ABAB CDCD EFEF
  • A final couplet (two-line stanza) with a rhyme scheme of GG
  • Iambic pentameter meter, which means each line has ten syllables with a pattern of unstressed and stressed syllables

This description is pretty good (after I truncated it), though ChatGPT doesn’t get any special points for providing this, since any Google search could lead to you such an explanation, or a far better one such as this. What was missing from ChatGPT’s definition is a more specific explanation of which syllables are stressed vs. unstressed. (Get this backwards and you’ve got trochaic pentameter, which is totally different.)

So next I asked ChatGPT, “Please explain to me how to write a poem in iambic pentameter.” It came back with this:

[Misc. prelude omitted here for brevity.] An iambic foot is a set of two syllables where the first syllable is unstressed, and the second syllable is stressed. Here's an example of a line in iambic pentameter:

“To be or not to be, that is the question.”

[Misc. pointless repetition omitted.] Count the syllables in each line. Each line should have exactly ten syllables. Vary the stress pattern slightly to avoid a monotonous rhythm. [More BS omitted.]

Here, ChatGPT presents probably the most famous line from Shakespeare, rather than a really appropriate one. The line above is a poor example because if you count the syllables, you’ll see there are actually eleven, not the “exactly ten” ChatGPT correctly said there should be. In that line, Shakespeare broke the rules on purpose, to cause the reader and/or theater audience to stumble. This was to draw attention to the word “question,” which in the context of existence is the whole point of the soliloquy. Great writing, but not a great example if you’re really trying to understand iambic pentameter. “Shall I compare thee to a summer’s day” would be a far better choice.

Okay, sure, that’s a bit nitpicky, but the instruction to “vary the stress pattern slightly to avoid a monotonous rhythm” is completely wrong. If you vary the stress pattern, it’s no longer iambic pentameter! And the notion of monotonous rhythm? Total bullshit! Consider Shakespeare’s Sonnet 18, here. It is absolutely flawless iambic pentameter—not a single syllable out of place—and it’s brilliant throughout, not monotonous whatsoever.

Now, you might wonder what ChatGPT meant about “ABAB” (etc.) rhyme scheme, so I asked it, “What is an ABAB rhyme scheme?” It replied:

An ABAB rhyme scheme is a specific pattern of rhyming words in a poem or song. It indicates that the last word in the first line of the poem or stanza rhymes with the last word in the third line, while the last word in the second line rhymes with the last word in the fourth line, and so on. In this rhyme scheme, the letters “A” and “B” represent different rhyming sounds. For example, in the following stanza:

I walked along the beach in the bright sun,
And listened to the waves as they did run.
The sand was warm and soft beneath my feet,
And seagulls flew overhead in the heat.

The words “sun” and “run” rhyme, so they are represented by the letter “A”, while the words “feet” and “heat” rhyme, so they are represented by the letter “B”. This pattern continues throughout the poem or stanza, creating a consistent and pleasing rhyme scheme.

Did you catch that? How full of crap ChatGPT was just there? If not, let me make it a little clearer: let’s take those A and B labels and apply them to the four lines: you’d have A for “sun,” and A for “run,” and B for “feet,” and B for “heat.” It’s AABB. It’s not ABAB. It doesn’t match the first paragraph of ChatGPT’s response. And it’s not the rhyme scheme of a Shakespearean sonnet. (It’s not even the rhyme scheme of a Petrarchan sonnet, which is ABBA. It’s just a rhyme scheme ChatGPT kind of made up, which it uses for every poem I’ve seen it write.)

We have established that ChatGPT totally blew it at describing a Shakespearean sonnet. So how does it do when actually writing one? I told it, “Write a quatrain in iambic pentameter, with an ABAB rhyme scheme, followed by a rhyming couplet, also in iambic pentameter, about post-urinal drip. Try to make it funny.” Here’s its response.

Inevitable, yet always a surprise,
Post-urinal drip can make one despise
The feeling of wetness that lingers on,
A sensation that’s not easily gone.
It’s a cruel trick of the human form,
To leave us damp after we perform.
But fear not, my friends, for there’s a way out,
Just shake it off and give it a good clout!

Oh. My. God. Is this the same priggish ChatGPT that, as mentioned in my last post, balked at writing humorously about how bad a laxative drink tastes, claiming “I cannot provide humorous anecdotes about sensitive medical procedures” and “I do not generate inappropriate or insensitive content”? It’s talking about violence to one’s genitals! I had to look up “clout” just to make sure there isn’t some alternate meaning that wouldn’t be this awful. A.I. is advocating, albeit jocularly, giving one’s penis “a heavy blow with the hand or a hard object.” WTF?! That’s not funny at all!

Meanwhile, ChatGPT simply didn’t follow my instructions. I asked for a quatrain followed by a couplet. That’s six lines total; it gave me eight (i.e., two quatrains). The rhyme scheme is also not what I asked for. As for iambic pentameter, this poem is a train wreck. Certain brief phrases (e.g., “To leave us damp,” “Just shake it off”) are iambic, but not a single line of the poem achieves this meter consistently. The very first word, “inevitable,” is fundamentally incompatible with iambic pentameter. Chat GPT just wrote whatever the hell it felt like in a very loosey-goosey meter of its own design. If you’re not getting my point, read the above poem aloud and see how hung up you get. Then compare it to this example of an actual iambic pentameter quatrain and couplet, penned by a human:

Relax, because you’ve earned your potty break;
Unburden your poor bladder of its pee.
And when you’re done you’ll shake and shake and shake;
An effort all in vain, it seems to me.
     For urine flow can never really stop,
     Until your undies drink the final drop.

Note that this second example doesn’t resort to crude ideas like clouting your penis. And note how naturally the poem gallops along because it truly follows the meter. Also notice that it doesn’t have vague throwaway notions like “A sensation that’s not easily gone.” Then consider that I wrote those lines (full poem is here) when I was only seventeen. I guess someone accustomed to the crap poems printed on Hallmark cards might be impressed at ChatGPT’s effort, but I’m not. It’s almost cynical how casually it breaks all the rules, including that of decency.

Dactylic trimeter

A.I. learns through being fed “training data” from the Internet. In case its struggle with the sonnet was based on bad information, the Shakespearean form being a very popular one, I decided to give it another shot, with dactylic trimeter, a much rarer form I have come to enjoy. I asked ChatGPT, “Please explain how to write in dactylic trimeter.” It responded:

Dactylic trimeter … consists of three dactyls, or three stressed syllables, followed by an unstressed syllable, in each line… Count the syllables in each line. Each line should have exactly three dactyls, or three stressed syllables, followed by an unstressed syllable. This means the line should contain a total of seven syllables. 

The description above is completely wrong. A dactyl is not a stressed syllable. It is a three-syllable foot in which the first syllable is stressed and the second two are not. A line of dactylic trimeter has three feet, with three syllables per foot, for nine syllables total, not seven. And even if a line did have three stressed syllables followed by an unstressed syllable, that would be four, not seven. Is ChatGPT just stupid? If I have to start reading work emails written by ChatGPT, I think I’ll just blow my brains out.

Okay, let’s see how A.I. did at actually composing this type of poem. I told it, “Please write a 10-line poem in dactylic trimeter about bike riding up two very difficult climbs, called El Toyonal and Lomas Cantadas, celebrating caprice and radical freedom. It produced this:

Pedaling up Toyonal, heart pumping fast
My mind is clear, free from the world’s contrast
With caprice and freedom, I take each turn
Pushing through pain, for what I will earn
Onward I climb, up Lomas Cantadas
My legs scream out, but my spirit never fades
The wind in my hair, the world fades away
A moment of peace, that I long to stay
And at last I reach the summit high
A sense of triumph, that no words can deny

You can see right away what an awful poem this is. It’s what Thomas Kincade would produce if he were a writer instead of an “artist.” I mean, “free from the word’s contrast”—what does that even mean? And “the wind in my hair”—seriously? What is this, a Harlequin romance novel? Lomas is a 10% grade, and I’m wearing a helmet! But what particularly stands out is that again, ChatGPT didn’t follow my prompt whatsoever. In the entire poem, only two of the feet are proper dactylic trimeter (“pushing through” and the first three syllables of “Lomas Cantadas”), which is surely just luck. As it did with the sonnet, ChatGPT just wrote whatever the hell it felt like. So why does everybody praise ChatGPT so much? It sucks! (For a proper poem on this topic, with actual dactylic trimeter, click here.)

One more thing

Okay, I can almost hear you now: “Oh, this particular chatbot is just using GPT-3! The technology getting better all the time! All the glitches you’ve found will soon be fixed! The next version’s gonna be amazing!

Well, maybe GPT-4 (etc.) will get better at poetic meter, and maybe it’ll learn how to be more concise. But I could also imagine its errors getting propagated further. Remember, GTP-3 learned mostly from training on massive amounts of human output from across the Internet, and (as I learned from my software maven friend) has over 100 billion parameters allowing it in some sense to memorize an enormous portion of its training set. Over time, as more people outsource their writing to A.I., its errors could be added to the pile of training data, and thus reinforced. Meanwhile, the content may stray ever further from that created by humans. The growing body of text on the Internet may come to have less and less to do with us—that is, with creators who have a soul, and a conscience. It’s tempting to hope that somehow the works of great writers will one day be scored higher somehow, to help the A.I., but why would we expect this when politicians, the media, and academia are kicking liberal arts to the curb? Meanwhile, most social media platforms today seem to prize forwards and re-posts as the most valuable Internet currency, so if any scoring were to be applied to A.I.’s learning, it’s probably more likely to be whatever gets a rise out of people—i.e., trolling and other bombastic vitriol.

As ChatGPT and its ilk gain ever more traction, what passes for writing could become, to borrow a phrase from Nabokov, the “copulation of clichés.” (He was talking about pornography, but the metaphor holds here, too.) As the data set A.I. uses becomes more and more generic, while the tool gets used by more and more people seeking to avoid engagement with the craft of writing, most real insight and individuality might gradually vanish from written correspondence. O brave new world!

Other albertnet posts on A.I. 

Email me here. For a complete index of albertnet posts, click here.

No comments:

Post a Comment