Introduction
Two posts ago, I described what I think is a fundamental dichotomy between two central capabilities of modern AI chatbots: 1) helping with a nuts-and-bolts operation like coding software or scripting HTML, and 2) creating something original, like an essay or story. The first category involves being a resourceful researcher blessed with excellent natural language processing; the second is probably closer to what humans are (so far) uniquely capable of doing.
Earlier this year I did a whole post on the first category, “What is ChatGPT Great At (and Not)?” And last week I blogged about one aspect of the second category: writing a scholastic essay. To further explore AI’s ability to generate meaningful content, and to evaluate its ability to truly understand language, I turn this week to poetry. That is, I decided to have the three dominant chatbots—Gemini, ChatGPT, and Copilot—write a poem in an unusual meter: dactylic trimeter, a poetic form I learned in high school (details here). I chose this meter because, as described here, ChatGPT does a pretty good job at the classic Shakespearean sonnet in iambic pentameter, but I wonder if that’s just really good parroting since there’s such a vast amount of training data out there for that. I think this exercise really puts the chatbots through their paces, giving us insight into which is the closest to being truly intelligent. As you shall see, the differences in performance are not subtle.
(Custom art by Whisk. No rights reserved.)
Gemini’s effort
To start out, I quizzed Gemini about dactylic trimeter, to see if it knows what I’m even talking about. Gemini correctly stated that the rhythm of such a poem would be “DA-da da | DA-da-da | DA-da-da,” and an example it created of the form was reasonably close. So far so good. But then, to make the rhythm better, I instructed the chatbot to add an extra trochee at the end of each line. A trochee is a two-syllable word with the stress on the first syllable, as in the word “praises” and the word “spirit.” As an example of this modification to dactylic trimeter, I provided Gemini these lines (that I took from a poem you can read here, in this albertnet post):
Once in a while a voice will sing praises,
Something to levitate everyone’s spirits.
A really smart AI, I would argue, could reverse-engineer the meter from those two lines alone, but I went one better and described exactly what I wanted in technical terms. Gemini correctly stated that the rhythm would therefore be “DA-da-da | DA-da-da | DA-da-da | DA-da” but its initial attempts at it were totally screwed up. I gave it a lot of coaching. I guess this is okay; a human with actual intelligence might require this as well.
Moving on, I prompted Gemini, “Now I would like to see if you can write such a poem based on an essay I provide. You can work in as much as you think works, understanding that not everything will fit. If I paste in a 1,600-word essay could you do that?” It said (in its usual obsequious way), “That sounds like a very interesting challenge! Yes, I can absolutely take a 1,600-word essay and distill its key themes, concepts, and emotional tones to create a poem in the specific Dactylic Trimeter + Trochee (11 syllables) structure.” I added that the poem should be ten lines long, with an ABBA rhyme scheme for the first two quatrains and then a rhyming couplet at the end. And I told it, “Focus on the idea of caprice: choosing the more difficult route simply to exercise one’s radical freedom and prove one’s mettle to oneself.”
Now, if you are a devoted follower of albertnet, you may recall that some years ago I wrote a poem in dactylic trimeter about my favorite cycling hill climb, the one-two punch of the steep, twisty roads El Toyonal and Lomas Cantadas. And, as with most of the poems on this blog, I provided copious footnotes and commentary, just as you’d get in an anthology of Shakespeare’s sonnets. To help AI write a good poem, the “essay” I uploaded was the compilation of my footnotes, so that Gemini had plenty of material to work with: the fleshed-out observations, feelings, etc. behind my Lomas poem. The idea here is to give you, reader, the ability to do a side-by-side comparison of two poems on the same topic—one written by your amateur blogger, the other by a much-ballyhooed AI. You can decide for yourself who did the better job. Here is my poem (and you can click here for the footnotes):
Ode to Lomas Cantadas
El Toyonal is a beast of an uphill;
Pedaling here is a bit self-defeating.
Lomas Cantadas will sweeten your beating:
Insult to injury, sweet overkill!
Most riders wisely take Wildcat Canyon.
Half the uphill, after all—and you’re tired!
Wisdom, alas, is a flaw when you’re mired
In glory, in notions of being a man.
Sometimes caprice is much better than ease:
Radical freedom is here to be seized!
Here is Gemini’s poem, representing its effort to incorporate all my footnotes and commentary:
Reckless Choice
The simple road beckons And calmly the wind calls,
We stand at the turn now, Deciding to stay free.
To choose the hard pulses Our own perversity.
We turn for the suffering feeling the high walls.
The logic demands the Less burdensome track-way,
But we must display the Radical self-might.
We take the bad option Defying the pure light.
To prove we are strong to Our own confidences.Yet this is the price of The joy and the pain-fall,
A reckless decision That answers the soul’s call.
So … how would you rate Gemini’s poem? Is AI going to come after our poets and put them out of work? Frankly, I’m less than impressed. The meter is a crude approximation of what I asked for, but as you surely felt (whether or not you could articulate the precise problem), almost every line is too long by one syllable, and instead of the easy, rollicking rhythm of a proper Krooprian poem, Gemini’s ode staggers and stutters. The AI pretty much blew it.
In case you really want to geek out, here’s exactly where the AI went wrong: in nine out of the ten lines, for no apparent reason, it preceded the first dactyl with an unstressed syllable, violating the rule I set forth about dactylic trimeter. Meanwhile, words like “perversity,” “radical,” and “confidences” aren’t padded out properly so we either have to mispronounce them to preserve the meter, or sound them out naturally which disturbs the rhythm—and since we don’t know which to do, we’re stopped in our tracks.
As far as the content, you can sense the AI compromising sense in order to (try to) achieve the rhythm and rhyme. Why else would it use the word “pulses,” the phrase “high walls,” and the Franken-words “track-way,” “self-might,” and “pain-fall”? The chatbot is practically babbling.
So is that it? I give it, say, a D and we move on to the next AI? Not so fast. I made the rookie move of feeding my prompt to the default version, 2.5 Flash, instead of the flagship version, 2.5 Pro, which Gemini tells me is “more adept at tasks requiring complex reasoning, deep instruction following, and high-quality creative output.” The good news is, as the AI assured me, all the context of my chat with 2.5 Flash can be used by 2.5 Pro, so it was super easy to switch to Pro and have the chatbot try again. Here’s its second effort, using the 2.5 Pro version:
The Gauntlet ThrownWell, I suppose it’s a bit better. Only one line erroneously starts with an accented syllable, and three of the lines actually have the correct meter (a modest improvement over 2.5 Flash’s zero). There’s some nice alliteration in the penultimate line. I also appreciate that 2.5 Pro didn’t do weird stuff with capitalization in the middle of each line. But the rhyming is pretty janky, and we still have these weird bolted-together phrases “straight-way” and the truly bizarre “gonzo-man.” To top it off, the very last word of the poem violates the rhyming convention, so the poem goes out on a flat note. This second poem is better but not by much.
Sober thoughts point the logical straight-way,
Turning left proves a man is still willing,
Recklessly choosing what he must still be,
Turning from all that the reasonable say.
This is the freedom to throw all the rules about,
Knowing the payment and what it will cost him,
Twenty hard minutes of everything lost now,
Heeding the irrational gonzo-man’s shouting.
A purely perverse act of personal pride,
Proving the mettle he carries inside him.
Conclusion: if Gemini offers to write my next albertnet ode for me, I’ma be like, “Naw, I’m good.”
On to our next contestant.
ChatGPT’s effort
I asked ChatGPT if it could write a poem in dactylic trimeter. It said, kind of curtly, “Definitely — I can write in dactylic trimeter.” Unlike Gemini, it didn’t bother to give me a sample straightaway. Maybe this is to save on computing cycles (and thus energy and water), which I can appreciate. But ironically, this is the one chatbot whose capability I have reason to doubt, because as you may recall, I tried this prompt before, almost verbatim, with version GPT-3.5, back in February of 2023. That chatbot crashed and burned with this poetic abomination:
Pedaling up Toyonal, heart pumping fast
My mind is clear, free from the world’s contrast
With caprice and freedom, I take each turn
Pushing through pain, for what I will earn
Onward I climb, up Lomas Cantadas
My legs scream out, but my spirit never fades
The wind in my hair, the world fades away
A moment of peace, that I long to stay
And at last I reach the summit high
A sense of triumph, that no words can deny
Ugh. I almost threw up in my mouth rereading that. As you can see, GPT-3.5 ignored my instructions about meter, made a very half-assed effort to weave the theme of caprice into its poem, and worst of all produced pure treacle, like the verbal equivalent of Thomas Kinkade. Needless to say I was eager, this evening, to see how the more modern version would do.
I gave GPT-5.1 the same prompts as Gemini, uploading the same 1,600-word essay, and here is what it produced:
Lomas CapriceWell, ChatGPT has come a long way for sure, but GPT-5.1’s effort is only somewhat better than Gemini’s. Certainly the meter is better, with a majority of the lines being correct. But the content is really off, with a bunch of the words clearly chosen just to satisfy the technical requirements without adding much meaning. The bit about “wise path is brighter” really makes no sense and is clearly just there for the rhythm and rhyme, no more sophisticated than Hall & Oates’ “your kiss is on my list.” In the next line, who is doing the mocking? And how does “the lighter” fit into anything? Lighter sky? Lighter weight? Cigarette lighter? It’s just a random word dropped into the poem. And in the next line, the word “quashing” in no way rhymes with “laughing” and doesn’t make sense as an intransitive verb. (“What are you doing this weekend?” / “Oh, you know, I’ll just be at home, quashing.”)
Turning toward Toyonal, reckless and laughing
Choosing the steep when the wise path is brighter
Mocking my reasons, the prudent, the lighter
Savoring folly my sane side is quashing
Freedom is choosing the climb that will break me
Just for the thrill of defying the sensible
Proving I’m bent for the wholly nonsensical
Riding the harder way just ’cause it shakes me
Pain is a dare that I answer with pleasure
Freely embracing the hurt as a measure
Since AI does its best work when you iterate with increasingly refined and specific prompts, calling out what it did wrong in its previous attempt, I decided to give ChatGPT another chance, and told it, “I think it would be better if it didn’t assume what you and I know already about this climb. Consider that somebody encountering this poem for the first time wouldn't know that Wildcat Canyon is the easier climb, and that choosing the 1-2 punch of El Toyonal and Lomas Cantadas makes no logical sense but appeals to one’s love of suffering and sense of caprice. So, please try again on the poem and give the reader enough background to grasp all this and thus to understand the choice.” It came back with a poem that was quite broken, with the same issue that Gemini’s first effort had: starting each line with an unstressed syllable. It also screwed up the rhyme in the second quatrain. I coached it repeatedly to fix these issues, and after several tries this ended up being its best effort:
Reckless Climb
Climbing the hills of green Berkeley foothills,
Pedaling hard as the thighs start to quiver,
Wheels weaving wild like a paperboy’s river,
Lungs heaving fire as the body fulfills.
Turning to torment, no reason persuades me,
Pain blooms in muscles yet joy is commanding,
Twists of the road, and the thrill never fades me,
Searing the legs, but the spirit is standing.
Pleasure is folly, the wholly absurd,
We choose what will hurt us, yet laugh at the hurt.
Right off the bat, the first line has three problems: it trips us up with a missing syllable; the hills are not always green; and hills/foothills is somehow both redundant and oxymoronic. The rest of the lines have correct meter, but a bunch of really clunky constructions. Wheels weaving wild? Really? And what could “like a paperboy’s river” possibly mean? “Lungs heaving fire”—what am I, a dragon? And the body fulfills what? (I just asked ChatGPT if “fulfill” can be an intransitive verb, and it replied, “No — “fulfill” is fundamentally a transitive verb, which means it requires a direct object.”) The mention of the twists of the road isn’t connected to anything, and “thrill never fades me” makes no sense—I mean, is the thrill like the sun, and I’m a curtain? There’s just an abundance of nonsense here, though I do like the last line. My last complaint is that the AI didn’t consistently follow the rhyme scheme I asked for: its second quatrain is ABAB, not ABBA. I asked it five times to fix this and it never did.
In summary, I’d say this poem is marginally better than Gemini’s effort (I mean, it doesn’t have anything as glaring as “irrational gonzo-man’s shouting”) but it’s basically unusable. Maybe GPT-6 will do better.
Copilot’s effort
Copilot knew exactly what dactylic trimeter was, and its initial example was remarkably good. I suppose it could have plagiarized something, or has a bit of boilerplate at the ready, but in any case I was impressed. I gave it the same extra instruction about tacking on an extra dactyl, and it got it almost right on the first try; a bit more coaching and it nailed it. After invoking Smart (GPT-5) mode, I fed Copilot the full prompt, with the same footnotes from my Lomas Contadas poem. Here is what the chatbot came up with:
Caprice on the Hills
Turning away from the sensible roadway,
Madness compels me to suffer the steepness.
Reason protests, but I laugh at its weakness,
Choosing the torment that beckons me boldly.
Freedom is found in the reckless decision,
Mocking the logic that urges avoidance.
Pain is a proof of my will and its poignance,
Glory is born of capricious derision.
Taunted by Toyonal, daring and cruel,
Lomas Cantadas declares I’m no fool.
Wow. Just wow. It is so much better than what Gemini and ChatGPT came up with. The meter is actually perfect. (Full disclosure: the meter falls short of what’s in my poem, but only because I forgot to give the AI one more instruction—to shorten the final line of each quatrain, and both lines in the couplet, by one syllable, to give them more punch. To be honest I’d forgotten I did this with my ode.)
I think the AI did a pretty job with the content as well. Calling my will “poignant” is a bit of a stretch perhaps, but we could chalk it up to poetic license. Meanwhile, the word “roadway” (like “poignance”) is nonstandard and evidently chosen to fit the meter, and also doesn’t rhyme very well with “boldly,” but this is nitpicking. My main issue with this poem, which is the same issue I had with the others, is that it doesn’t give the reader enough context to really understand some of the basics: that there’s an easy option (Wildcat Canyon) and a harder one (El Toyonal + Lomas Cantadas).
Perhaps as a victory lap, Copilot offered to write two more versions with slightly differing tones (the last being a drinking song). These weren’t actually as good as the first so I won’t bother dissecting them, but I did like the final couplet in one of them. I decided that since Copilot’s initial poem was good enough to warrant refining, I would point out its central error and let it try again (using the new couplet). Here’s its revised version:
Caprice on the HillsAgain, it’s not perfect. The bit about “turn left” doesn’t really make sense without more context. And what is meant by “counsel good timing”? There is no timing here. In addition, the phrase “its cost the expense” seems redundant. But those matters aside, the meter is solid, the rhyming consistent, and that final couplet is a real banger.
Wildcat is gentle, a steady road climbing,
Toyonal rises with crueler intention.
Still I turn left in a rash intervention,
Mocking the voice that would counsel good timing.
Freedom is folly, a dare to my reason,
Choosing the torment that makes no good sense.
Pain is a jest and its cost the expense,
Glory is born of capricious decision.
Madness is mettle, a jest I renew,
Lomas Cantadas — I suffer for you.
These AI chatbots always seem to want to extend the dialogue and provide more and more and more, which is kind of a double-edged sword. On the one hand, as human beings we should always be working to limit our time online and get out there in the world, right? On the other hand, refining what we get from chatbots is pretty key to making them an effective tool. So when Copilot asked if I’d like it to craft a prose introduction to the poem, I suddenly had another idea: what if I asked it to now create its own footnotes? This post is long enough already so I won’t post them here, but let me say that Copilot did a pretty good job on that.
And here is where I see this AI having a role with a real human writer (at least at the student or blogger level): it could probably help with writer’s block simply by producing something worth polishing. It kills me to concede this, actually, and I am far too proud to ever resort to this kind of “Hamburger Helper” approach to my own writing. But honestly, a cyclist who would like to compose a ride-themed poem in dactylic trimeter, replete with footnotes, could do worse than to start with Copilot. (Neither poem above truly passes muster, but taking the best of each, and from perhaps a few more attempts, and then replacing all the weak parts with our own lines, would be easier than—albeit still inferior to—starting from scratch.) The output of such an exercise might actually have some value, versus the writer getting frustrated, giving up, and producing nothing.
Crucially, the thing the AI will never be able to do is go on the bike ride, have that experience, and grasp what is important about it. So a human could start there and then get some help from AI in expressing himself or herself, since not everyone has the luxury of a liberal education. If AI is called upon to bridge that gap, the current Copilot is far better poised than Gemini and ChatGPT, I think we can now conclude.
If you read my last post, you may recall that Copilot did the best job of these three chatbots at writing a scholastic essay as well. Keep an eye on this one … Microsoft, through its partnership with ChatGPT’s OpenAI as well as its own resources, seems to be ascendant.
Other albertnet posts on A.I.
- I, Chatbot
- Almost Intelligent – Part I
- Almost Intelligent – Part II
- Smartphones & Artificial Stupidity
- A.I. Smackdown – Moto vs. Cortana vs. Siri
- Could Artificial Intelligence Replace Writers? – Part 1
- Could Artificial Intelligence Replace Writers? – Part 2
- Could Artificial Intelligence Replace Writers? – Part 3
- A.I. Smackdown – English Major vs. ChatGPT – Part 1
- A.I. Smackdown – English Major vs. ChatGPT – Part 2
- Schooling ChatGPT
- Will A.I. Steal Our Jobs?
- Tech Check-In - How Good is the Latest A.I.? - Part I
- Tech Check-In - How Good is the Latest A.I.? - Part II
- What is ChatGPT Great At (and Not)?
- Tech Reflection – Two Sides of AI
- AI Smackdown – ChatGPT vs. Copilot vs. Gemini
Email me here. For a complete index of albertnet posts, click here.









