Over a year ago, I came across this fascinating article about whether or not Artificial Intelligence could write a New Yorker article. The answer was essentially “no” or “not yet,” but it got me pretty riled up anyway. Ever since, I’ve been evaluating A.I.’s ability to correctly suggest even a word or phrase as it parses my text. In this post I wade into that history as I examine the question of A.I.’s (supposed) ascendance.
The New Yorker article
The New Yorker article, “The Next Word,” was in the October 14, 2019 issue. The writer, John Seabrook, talked with A.I. experts at Google about their “Smart Compose” feature, which predicts how your sentence ought to end and suggests the remaining words, so you can just hit tab to accept the suggestion. (If you use Gmail, you’re already familiar with this.) Seabrook also talked with the folks at another company, OpenAI, about their GPT-2 engine, which composes entire sentences and even complete paragraphs, with made-up quotations no less, in the voice of a real writer it “learns” and then mimics. GPT-2 is still under development; OpenAI claimed it has been delayed because it’s “too good at writing” and they fear society isn’t ready. (Yeah, right.) Seabrook tried it out, and in the online version of the article you can see its efforts at contributions to his story.
Seabrook included an extended quotation from Steven Pinker, a Harvard psycholinguist, that had been appended with text generated by GPT-2, and challenged the reader to figure out where the real quote ended and A.I. picked it up. (You can take the challenge in the online article.) I found this exercise really easy, but Seabrook reported that almost everybody he tried the “Pinker test” on “failed to distinguish Pinker’s prose from the machine’s gobbledygook” and concludes, “The A.I. had them Pinkered.”
Does this scare you? It sure scares me. I have no doubt that great literature will always be written by real writers, no matter how good the A.I. gets, but run-of-the-mill journalism and magazine writing, which mainly exist to serve up ads anyway, might someday be written by a clueless A.I. that has no more grasp of insight and fact than do certain famous politicians. As Seabrook puts it, “One can envision machines like GPT-2 spewing superficially sensible gibberish, like a burst water main of babble, flooding the Internet with so much writing that it would soon drown out human voices, and then training on its own meaningless prose, like a cow chewing its cud.”
Hoping against hope that this A.I. capability is overrated, I have been paying close attention to how well it has done across the devices I use, and recording its more salient failures, over the past year. In Internet time, a year is a pretty huge span—in theory I should have seen marked improvement in that period. Well, here’s what I found.
I have to confess, I haven’t grabbed many snapshots of Google’s Smart Compose behavior because I only use Gmail at work, and I’m fairly religious about separating work and play. I did grab a few examples though, because they flew in the face of how the function was reputed to work. Seabrook quotes Paul Lambert, who manages this feature for Google, as saying, “If you write ‘Have a’ on a Friday, it’s much more likely to predict ‘good weekend’ than if it’s on a Tuesday.”
Weirdly, this didn’t work for me at all. Check out these samples of what Smart Compose suggested to me on a Friday morning:
Given that Father’s Day was more than seven months away, this suggestion struck me as totally moronic. So I added an “r” after the “F” to see how it would recover:
This would make sense if Chris had wished me a happy Friday … but he hadn’t. I decided to shoot for “weekend” to see how that would go:
In the year I’ve kept an eye on Smart Compose, I haven’t again seen anything as egregiously inept. Mostly what I notice is that it doesn’t suggest words or phrases all that often … perhaps my work emails are too technical or otherwise cryptic. (The A.I. that powers Smart Compose was trained on millions of real emails, but none from Google’s business customers.) The A.I. is pretty good about really basic stuff, like suggesting “you have any questions” after I type “Please let me know if,” but that’s about it. As an experiment, I composed this blog post in Gmail and it didn’t suggest anything. It’s like I overwhelmed it somehow. So much for that.
Gboard predictive text
Perhaps more useful, day-to-day, than Smart Compose is Google’s predictive text for the Gboard virtual keyboard, which is bundled with their Android operating system. Predictive text seems to come into play with every app on my phone that relies on typed input. Frankly, I don’t like typing much on the phone so I do most of my writing on the computer. The main thing I type on my phone? Text messages, which I mostly trade with my older daughter who is off at college. (Alas, texting seems to be her generation’s preferred method of communication, at least where their parents are concerned, and I’ve decided to humor my daughter in this.)
My experience? Naturally, predictive text comes in handy, usually in the context of completing words I’ve mostly typed. I hasten to point out this is a lot different from A.I. actually composing anything. If I type “has,” it’s going to suggest “has,” “was,” and “hasn’t” because those are the most likely candidates, and most of the time I’ll accept one of those. If I type “hast” it suggests “hast,” “host,” and “hash,” and if I type “haste” it suggests “haste,” “taste,” and “waste.” (After all, “haste makes waste,” we all know that.) Android is not going to suggest “hasten” because apparently not too many people use that word. This is the bulk of how predictive text behaves, and though it’s not as sophisticated as Smart Compose (much less GPT-2), it works a lot of the time. It also fails a lot.
If we’re really going to count on A.I. to create content for us at any point, I see three things it absolutely has to right. First, it needs to not make any grammar or spelling errors, obviously, since you can’t have it making the putative human author look stupid, or burdening an editor with fifty times the errors a real writer would make. Second, the A.I. can’t commit any serious gaffs that would render the text offensive or at least laughably ignorant. Finally, the A.I. will have to really understand context if it’s to reach its intended audience (well, our intended audience, since A.I. can’t really have anything like intention). An A.I.-written article for the slightly racy GQ or Men’s Health magazine better not sound like Good Housekeeping; Hunter S. Thompson shouldn’t come off like Heloise.
So here’s how the A.I. on my phone has stacked up in these areas over the last year.
Grammar and spelling
It’s kind of remarkable that anybody thinks we’re on the brink of A.I. being able to compose anything when it still doesn’t really do so well with grammar and spelling in its predictive text. I could supply countless examples of errors in this realm, but that would get dull, so I’m providing one example each of the main types of errors I see.
First, it breaks very basic rules about capitalization, failing to capitalize proper nouns or the first word of a sentence. (Predictive text’s cousin, voice recognition, screws this up quite a bit as well.) If the human fails to capitalize a word, A.I. should fix it, rather than expecting us to bother with the shift key a lot. Here’s an example:
It also screws up with predicting subject/verb agreement, so lots of its word suggestions wouldn’t work without my having to backspace and add an “s,” which is clunkier than just typing the word right to begin with. I fight with this many times a day. Here’s an example:
I mean, come on! “We’re huge fan.” That’s not very helpful. “We’re huge favor.” Look, Android, the verb is “are.” It needs a plural predicate nominative. This is not rocket science.
One of the most annoying things predictive text (and its sibling, auto-correct) does is to “fix” my errors for me on the fly, without asking. Usually I end up sending the text before I notice the problem, and then have to explain to the recipient that it’s not my error, which is way more work than for me to just type everything myself with no “help.” (Yes, I know that youngsters these days have no problem sending messages that are utterly littered with errors, but remember, we’re talking about A.I.’s ability to compose text one day ... the bar needs to be higher.) Look at this travesty:
Another category of failure is when predictive text doesn’t grasp what part of speech the next word needs to be. Consider this example where “very” has been set up, within the sentence, to be an adverb modifying another adverb. There is zero benefit in predictive text suggesting an adjective here.
It doesn’t take an English major to grasp that “Text messages don’t convey irony very groovy” simply doesn’t make sense.
Finally, suggesting anything that isn’t really a word is pretty pointless. One of the three choices typically offered up is the fragment of a word you’ve already typed. Why offer this? It’s a waste of screen real estate. And then I’ve seen suggestions that either aren’t words, or basically aren’t words. Look at this example:
Obviously “trifec” isn’t a word, so why give me the option of accepting it? If I really wanted it, I could just hit the spacebar. And “triger”? It’s basically not a word. It’s not in Google’s own spell-checker dictionary; it’s not in the American Heritage Dictionary; and it’s not in the Wiktionary. Okay, I found “triger process” in the online Merriam-Webster dictionary, so maybe Android was setting me up for that phrase, which means “a method of sinking through water-bearing ground in which a shaft is lined with tubbing and provided with an air lock so that work proceeds under air pressure.” But what are the odds this is what I was writing about? Exactly zero. Android obviously should have guessed “trifecta.” If A.I. starts to write articles, will we have to suffer through tedious asides about digging through waterlogged ground?
Some seemingly phonetic goofs
Sometimes the A.I. seems to be working phonetically and makes a suggestion that almost makes sense—but of course almost doesn’t cut it when it’s supposed to save you work while maintaining (or ideally improving) accuracy. Check this out:
Any human could have guessed “all its splendor and glory” and Android almost got it. But “all its splendor and Gloria”? Really? (Could’ve been worse, I guess … it could have changed “its” to “it’s” again.) Here’s another failure:
Many American morons have called COVID-19 a hoax, but I doubt any have called it a Hoke. (If you’re wondering where it even got “Hoke,” I can tell you I’ve used that word in eight texts, referring to a character, Hoke Mosely, who appears in four Charles Willeford novels. Among book characters he resembles a virus in no way whatsoever.)
Here’s a final example of the A.I. seeming to fail via phonetic bumbling:
It’s almost as though the predictive text software heard somebody say “thirsty” and thought it heard “Thursday.” But of course that didn’t happen … you can see where I typed “thirs.” And how could anyone be Thursday? It makes no sense. On the other hand, if you told somebody to say the first thing that popped into their head when you gave them a prompt, and the prompt was “hungry and …” I’ll bet nine out of ten would say “thirsty.” (One out of ten would be somebody on a diet who might say something like “bitter.”)
It is hard to make a case that these goofs are truly phonetic in nature. But is it feasible these errors are simply random? Well … how the hell should I know? I never said I was a brain scientist or computer technologist. But I have a couple of theories, around context and … oops, unfortunately I seem to be out of space here.
Tune in next week for Part 2 of this essay, where I’ll explore some more ways A.I. can go wrong, with a number of wince-worthy predictive-text FAILs.
Email me here. For a complete index of albertnet posts, click here.