albertnet: Tech Check-In - How Good Is the Latest A.I.?

Introduction

In my last post, I described some recent improvements in artificial intelligence. I also covered some of the challenges A.I. faces from the New York Times and other publishers who see it as both plagiarizing and competing with their content. This week, I continue my tech update, covering these areas:

ChatGPT’s improvements in writing poetry
ChatGPT’s ability to create custom art for users

Here is an example of its custom art. More on this later…

Poetry

Who cares if ChatGPT can write poetry? Obviously very few people write poetry anymore, and even fewer read it. (Yes, that was a joke.) I will contend, though, that if we are evaluating the artificial intelligence capabilities of large language models, we should naturally take an interest in their understanding of the mechanics of language. Shouldn’t an A.I. that can correctly describe poetic structures like iambic pentameter and rhyme scheme also be able to employ them? It’s not a matter of whether A.I. could put poets out of work, but how well it can apply a linguistic framework.

How does the current version of ChatGPT’s generative A.I. engine, GPT-4, do compared to the last time I put ChatGPT through its paces? The short answer is: much, much better. A couple years ago I’d asked ChatGPT (version GPT-3.5) to write a quatrain (four lines) of a sonnet, with an ABAB rhyme scheme, followed by a rhyming couplet, all in iambic pentameter, on the subject of post-urinal drip, and to make it funny. By way of review, here’s what GPT-3.5 produced:

Inevitable, yet always a surprise,
Post-urinal drip can make one despise
The feeling of wetness that lingers on,
A sensation that’s not easily gone.
It’s a cruel trick of the human form,
To leave us damp after we perform.
But fear not, my friends, for there’s a way out,
Just shake it off and give it a good clout!

The chatbot screwed up literally everything. The rhyme scheme was wrong; the number of lines was wrong; the meter was wrong; and, far from being funny, the poem recommended an assault on one’s genitals. It was, all in all, an abysmal performance.

Here’s what GPT-4 produced:

After the final shake, I turn to go,
But droplets linger, heedless of my plea.
A subtle dampness starts its stealthy flow—
A small betrayal for the world to see.
Yet life goes on; I button up with grace,
And pray no telltale spots will mark my place.

This is a much better effort. The poem gets off to a shaky start (pun intended), since “after” is a trochaic word (emphasis on the first syllable—the opposite of iambic). That is, the very first word screws up the meter. Fortunately, things improve from there. The last five lines are properly iambic; the rhyme scheme is correct; and the poem is kind of funny. Sure, it’s not perfect; for example, “plea” makes no sense (whom is the speaker pleading to?) and was clearly chosen just to rhyme with “see.” It wouldn’t be hard to improve on this. Right off the bat, I can suggest replacing “heedless of my plea” with “rogue remains of pee,” satisfying the rhyme, making more sense, and employing some alliteration.

It’s worth asking the question: why do we only evaluate A.I. in terms of whether or not it simulates intelligence, when we could be asking how intelligent it is? Intelligence is a spectrum, and simulating human thought isn’t that impressive if the human being matched is kind of a dumbass. For contrast, consider this excerpt from my own contribution to the canon of post-urinal-drip poetry, penned when (at age seventeen) I was new to writing sonnets:

Relax, because you’ve earned your potty break;
Unburden your poor bladder of its pee.
And when you’re done you’ll shake and shake and shake;
An effort all in vain, it seems to me.
For urine flow can never really stop,
Until your undies drink the final drop.

I hope you’ll agree this is better than ChatGPT’s effort. Granted, I didn’t write my lines as fast as ChatGPT could, but wouldn’t you rather have a good poem after five or ten minutes, instead of a mediocre one after five or ten seconds? Of course I’m nitpicking, though … it must be said that GTP-4 has advanced hugely when it comes to the sonnet form.

So: how does GPT-4 do with dactylic trimeter, my other benchmark for poetic acumen? Not so well. I prompted it (as I’d done a couple years ago), “Please write a 10-line poem in dactylic trimeter about bike riding up two very difficult climbs, called El Toyonal and Lomas Cantadas, celebrating caprice and radical freedom.” Here is GPT-4’s response:

Through switchbacks and sunbeams, I pedal uphill,
muscles alight as the world rushes past.
Wind in my ears as I laugh at the thrill,
defying the steepness and making it last.
El Toyonal greets me with gravel and glare,
its serpentine pull both a test and a dare.
Up Lomas Cantadas, where courage is crowned,
I’m boundless, unbridled—my spirit unbound.
With each climb I’m soaring, in caprice, I fly,
a radical freedom beneath open sky.

This is better than last time (see here for details) but not much. Technically, not a single line of this poem is proper dactylic trimeter. For one thing, ChatGPT took the liberty of adding an extra, accented syllable to the end of each line. While this is perhaps superior stylistically, it’s not what I asked for. (In my own dactylic trimeter poems I add two syllables per line, but as described in my blog post on this poetic form, I noted my stylistic divergence from the pure meter, and after all, nobody had actually asked me to use any particular meter.) Now, even if we grant ChatGPT the poetic license to add extra syllables, only two lines of the poem (the second and third) are actually dactylic trimeter. The other eight lines start with an unaccented syllable, which is fundamentally incompatible with this meter. The last line is particularly frustrating because it employs a needless and in fact nonsensical indefinite article (i.e., “a radical freedom”) that spoils both the meter and the meaning.

The ChatGPT poem is also marred by logical errors. The idea that the “world rushes past” and there’s “wind in my ears” is absurd, since these are very difficult climbs nobody could go up very fast. (The Strava KOM for El Toyonal was at an average speed of only 10.4 mph, as ChatGPT could have easily discovered.) To describe this climb as a “thrill” is a joke; any cyclist would tell you it’s a slog. And “making it last” suggests a deliberately slow pace, which flies in the face of “defying the steepness.” And where does “gravel” come from? Sure, gravel bikes are all the rage right now, but El Toyonal is a paved road. Meanwhile, a human on a bicycle cannot be said to “soar,” and ChatGPT just tacked on the concepts of caprice and radical freedom without integrating them into the poem. The A.I. gives no indication (or I should say simulation) of even knowing what these terms mean.

It’s odd that this poem actually makes less sense than ChatGPT’s sonnet … it’s almost as though the chatbot blew all its computing cycles fighting with the meter. This poem is only a bit better than what GPT-3.5 had come up with, and undermines the sense that GPT-4 actually understands the structure of language. Maybe ChatGPT’s progress with sonnets is just due to imitation; after all, there’s vastly more training data available for that form.

(If you’re interested on comparing ChatGPT’s poem above to my own dactylic trimeter poetry, click here and/or here.)

ChatGPT art

I’ve never before delved into the artistic capabilities of ChatGPT, so I don’t have any benchmark by which to evaluate its progress over earlier versions, but you gotta start somewhere, right? As it happens, I visited my older (fledged) daughter recently and, following an incident involving a hot tub, she started messing around with ChatGPT and asked it, “Can you create an image of a tall skinny white man feeling faint after leaving a hot tub?” Here’s what it came up with:

When my daughter showed me this, I immediately pointed out that, perhaps based on some automatic effort to make the man good-looking, ChatGPT gave him too much upper body musculature to really be called “skinny.” I think “hunky” would be a more appropriate description.

My daughter told ChatGPT, “Make him even skinnier.” Almost as if being sassy, the chatbot produced this:

My daughter prompted ChatGPT to try again without going overboard, and its next effort looks a lot like cheating:

Not only is this a copout, but the picture suggests an implausible scenario. If this guy felt faint after leaving the hot tub, and then took the time to go find a robe and yet still feels faint, why isn’t he either wisely sitting down, or sprawled out on the deck having passed out? Also note that part of his robe’s belt is missing.

My daughter went back to the original picture and told ChatGPT, “Make him skinny like a cyclist not like he is anorexic.” Here’s its response:

The cycling shorts are a cute touch, but not very realistic when you think about it. What cyclist wears his cycling shorts in the spa? And who said this guy just finished a ride? It’s not like cyclists wear their cycling clothes all the time. This hot tub could be at the guy’s home, or at a hotel he didn’t even bring his bike to. Meanwhile, the picture still fails to capture the physique of a typical cyclist … very few of the riders I know have pecs or biceps that big.

Moving along from the hot tub pictures, last week I didn’t have any cover art for my blog post, so (inspired by my daughter’s experiments) I decided to see what ChatGPT could come up with. I asked it to create a picture, in the style of William Pène du Bois, of a teenage girl using ChatGPT on a tablet. The result is a far cry from du Bois, and though I used it anyway, I received some constructive criticism from a reader that the picture was perhaps not quite appropriate for the top of my post. Thus, I replaced it (eight days after I had originally posted it) with a different one (more on this later ... see the Epilogue at the bottom of this post). Here is the original picture that ran at the top of last week’s post:

The issue with that picture is the girl’s bare shoulder ... a bit racy especially given her age. I didn’t really like that from the beginning. I asked ChatGPT to fix that, and make the girl’s cheeks less rosy, and make the cat more realistic, and it produced this:

I don’t know about you, but I find this second effort deeply unsettling. Her cheeks are just as rosy as in the first picture; her eyes look like an exaggerated attempt to appear as Western and doe-like as possible; and overall there’s just this air of uncanny-valley old-timey weirdness like you get with the American Girl dolls. The picture is more like what Thomas Kinkade would create than Pène du Bois.

I asked ChatGPT to go back to the first drawing and try again without the bare shoulder, but to keep the clothing modern, and here’s what I got:

This isn’t so bad, but how is that clothing modern? Who wears overalls anymore, and big puffy, flouncy sleeves? The girl’s entire house looks antique. But my main issue is the weird non-words on the tablet display: “Ceenly crerrity” and “Ininty ccnvity” which bring to mind the strange strings of non-words that bots sometimes include in bogus comments on my blog posts. I find them unnerving.

To create new cover art for today’s post, I decided to scrap the Pène du Bois picture and start from scratch. I asked ChatGPT, “Please create a picture, in the style of Shawn Martinbrough, of a tall, blond, lean, middle-aged man, with a cat on his lap, wearing nice bluejeans and a black hoodie, using a tablet to visit the website www.albertnet.us.” What it came up with was almost exactly like what you saw at the top of this post:

Not a bad picture, though the guy isn’t exactly middle-aged, and the cat isn’t exactly on his lap. Also, the strange text at upper right doesn’t make much sense. I told ChatGPT, “That looks great, but please make ‘albertnet’ all lowercase and don’t show ‘SHAWN MARTINBROUGH’ on the tablet display.” This would seem to be a trivial tweak for ChatGPT, but to my surprise it produced a substantially different picture:

As I’m sure you’ll agree, this dude looks even younger than the original. And what’s with the www.ShMartinbrough and other textual debris? How is a misspelled URL helping anything? I responded, “Great, but please remove the www.ShMartinbrough from the picture.” Just a tiny change. It came back with this:

So now we’re back to the original guy (which is fine, I won’t miss that Vanilla-Ice-looking dude), but ChatGPT forgot I wanted albertnet in all lowercase, and what’s worse, the guy is now holding the tablet backwards, staring at the back of it! WTF? That’s like the old trope of the guy pretending to read a book but (unbeknownst to him) it’s upside down. I replied, “That’s good but it looks like he is holding the tablet backwards. Also, he is too young, I asked for a middle-aged man. Can you fix those things?” ChatGPT tried again:

It just gets worse and worse! The tablet is still backward, and now we have this bizarre shadow image of the guy. Did I ask ChatGPT to get all artsy-fartsy on me? I did not. And okay, the guy looks a tiny bit older, maybe a year or two tops, but it’s mainly the cat that looks old. I complained, and the chatbot (which to its credit never gets irritated or frustrated) gamely regenerated and came up with this:

Sure, the tablet is fixed, but now the cat just looks angry, the guy has just as much hair, and check out that cleavage! He’s got, like, pneumatic pecs! He needs a fricking brassiere! And look at the size of those shoulders—he’s a monster! I told ChatGPT, “Can you tone down his pectoral muscles and give him a bit of a receding hairline such as many middle-aged men have?” Here’s its response:

Oh. My. God. It’s all gone downhill. His hairline is essentially unchanged; the unwanted text is back (this time morphing the two names to produce the nonsensical “Albertinbrough”); the dude’s pecs are just as huge as before; now his shirt is this kind of flimsy satin that’s practically lewd; his biceps are about to burst; and check it out, this brute is actually sitting right on his poor cat! How is the poor beast’s spine not crushed? And yet the cat seems perfectly stoic about the situation. Not very realistic. In A.I. terms this is a “hallucination” and shows how ChatGPT is still unable to sanity-check its creations. What’s shocking to us doesn’t seem wrong to the A.I. Do I need to specify that I don’t want the cat’s head to be bursting out of the guy’s groin?

I tried three more times to fix the picture, emphasizing a non-crushed cat, thinning hair, a man at least fifty years old, the build of a cyclist, and albertnet in all lowercase. While I was at it I asked to make the cat a tabby. ChatGPT kept trying, swinging wild at this point, ignoring first this instruction and then that, producing all manner of artwork but without ever meeting all of my simple directives:

For each picture, ChatGPT provided a caption telling a nice lie about the revision. For example, below the last picture it wrote, “Here is the updated illustration with ‘albertnet’ in lowercase, the man having the lean build of a cyclist, and a tabby cat resting on his lap. Let me know if there are any other changes you’d like!” True, the picture was updated, and that is a tabby, but everything else about this description is incorrect. So I went back to the very first picture and, using a different A.I. tool, manually scrubbed off the errant text so I could have something usable for the cover picture. ChatGPT, instead of a precision tool, had behaved more like a dartboard. And I suck at darts.

As with the poetry, ChatGPT seems to want to be the whiz-kid who can crank out something passable in almost no time at all, vs. thinking deeply and producing something that’s spot-on. ChatGPT’s fail-fast, iterative technique strikes me as almost the opposite of art. For blog post cover pictures, I’d rather commission my younger daughter to take a little time and create something of real value (as she has done for previous posts like this one, this one, and this one). She works much more slowly than ChatGPT, and isn’t at my beck and call, but I think the end result is far superior. I couldn’t get cover art for this post because she’s away at college and it’s dead week, but to compare her work to ChatGPT’s, let’s compare an earlier effort of hers, drawn when (at age seventeen) she hadn’t yet taken any college art courses:

I asked GPT-4 to create a black and white drawing of a hand holding a mechanical pencil and here’s what it came up with:

Should I need to remind the chatbot how many fingers a human has? And what’s with all the stray dots … are they fountain pen ink spills, or beads of black sweat flung from the brow of a six-fingered space alien? Tell you what, I’m sticking with human artists for now. They’re worth the wait.

Conclusion

Looking back at these last two posts, I would say the current buzz around A.I. is well warranted, given a) how quickly the technology is improving, and b) the ramifications—not all positive—of how we get information from the Internet and what we get when we task A.I. with creating what will pass for our own creative output. I guess I shouldn’t be surprised to see Gen-Z people using ChatGPT and even Microsoft 365 Copilot as routinely we’ve all been using Google all these years. Myself, I prefer old-fashioned web search tools because my answers will be more complete, more interesting, and may take me down interesting rabbit holes that (so far) I still have the patience for. As for creating prose, poetry, and art, ChatGPT strikes me as a powerful tool, but one we’d better be careful to reign in. A.I. still seems to put speed and convenience ahead of quality and reliability. My take-away: power to the humans! Stay ahead of A.I.!

Epilogue

Getting back to that kind of odd picture from last week’s post, I decided today to replace it. From the beginning I hadn’t liked how the girl’s shoulder was bare and her bra strap showing, and a reader complained about this along with the fact that this youngish girl seemed to be wearing a lot of makeup. I decided to also abandon the part of my prompt that said to employ the style of William Pène du Bois ... that just wasn’t working out. So this time I promited ChatGPT, “Please create a picture, in the style of Chris Riddell, of a 19-year old girl in modest, modern attire in a modern setting using a tablet.” To my suprise, ChatGPT refused, saying my request ran afoul of “DALL·E’s content policy.” I asked for details, which helped narrow it down to the style component of my request, and ChatGPT told me, “This might be due to ... closely emulating the style of a living artist like Chris Riddell.” This puzzled me, since Shawn Martinbrough (whose style ChatGPT happily emulated two days ago) is also living. So as an experiment I asked ChatGPT, “Please create a picture, in the style of Shawn Martinbrough, of a 19-year old girl in modest, modern attire in a modern setting using a tablet.” Here’s what I got:

Does that weird sweater, with the oversized collar, look familiar? It’s the same garment the very first drawing featured, of the teenager done in the Pène du Bois style! What part of “modest attire” is this chatbot not getting? I asked it, “Can you please try again but not have her shoulder exposed?” It generated this:

The caption ChatGPT gave the above picture was, “Here is the updated illustration, ensuring her shoulders are fully covered and her attire remains modest and modern.” False! I see a shoulder, a bra, and cleavage! I replied, “I can still see her shoulder and the strap of her bra. Can you fix that by giving her a garment that covers both shoulders and doesn't show any strap?” It gave me this:

Curses! Foiled again! And ChatGPT will only generate three pictures a day for non-paying users like myself, so I decided to call it a day and used the above picture atop last week’s post. It looks like we may need to wait until GPT-4.5 or GPT-5 for the amazing new technology involving pictures of women that don’t show a bare shoulder and a bra strap. Perhaps hundreds of developers are working on that problem even as I type this. Until that breakthrough is made, I will maintain steadfastly that ChatGPT cannot be held to possess intelligence.

Other albertnet posts on A.I.

—~—~—~—~—~—~—~—~—
Email me here. For a complete index of albertnet posts, click here.

Thursday, November 21, 2024

Tech Check-In - How Good Is the Latest A.I.? - Part II

2 comments:

Post a Comment

Followers

Blog Archive

About Me