NOTE: This post is rated R for mild strong language.
Introduction
“Almost
intelligent” might be a good name for somebody’s biography (or autobiography)
but here I’m talking about artificial intelligence. My last post described my experience
chatting with an application called Cleverbot that tried to simulate human
dialog convincingly. Here, I’ll tackle
the subject of AI language more generally, looking at speech recognition, natural
language, and translation.
Do we care?
If you
really don’t care about AI at all, go read something else—or, better yet, read
on to see why maybe you should care.
On the one
hand, AI is very exciting. As computers
have become “smarter,” and easier to use, they’ve gotten so useful it’s hard to
imagine how we ever did without them.
I’m thinking about Google, GPS and other mapping applications, package
tracking, e-mail spam filters … the list goes on and on.
On the other
hand, AI is a bit scary, and as a human I prefer to believe I could never be
replaced by a computer. I shudder at the
thought that human behavior could be so unvarying and predictable that one day
we’ll barely be better than a really good computer program. I want my computer applications to get smart,
but not too smart.
Voice recognition and natural language
There’s a
button on the side of my smartphone that, when pressed, startles me by
causing the speakerphone to say, “Say a command!” I’m vaguely aware that my phone will respond
to voice commands but have no interest in issuing them. Most of the cool features of smartphones
involve the silent, non-speech stuff you can do—e-mail, Internet browsing,
etc.—as you’ll notice on the subway when half the people are silently tapping
away. (The popularity of texting—a way
to privately communicate without being eavesdropped on by the person you’re
ostensibly talking to face-to-face—is a classic example of how phones are
becoming increasingly mute.)
That said, the
iPhone’s voice-recognition application, Siri, seems to be making a bit of a
splash. (Nobody I know uses Siri yet,
but I’m sure some will.) This demo shows how Siri is pretty good at understanding speech and figuring out what you
want it to do. (I played with a Droid
phone recently and it was also very good at typing for me as I spoke.) The reviewer asks Siri, “Where can I have
lunch?” Siri replies, “I found fourteen restaurants
whose reviews mention lunch. Twelve of
them are close to you.” This seems
easier than typing into Google on a little phone. But the natural language feature isn’t
perfect; the reviewer says, “How about downtown?” and Siri replies, “I don’t
know what you mean by ‘how about downtown.’”
Perhaps Siri’s
communication isn’t “connection-oriented”—that is, it doesn’t consider “how about downtown?” in the context of “Where
can I have lunch?” but takes the two queries as totally discrete and
unrelated. If so, this is a major
shortcoming.
The reviewer
tries again: “I want to have lunch
downtown.” Siri replies, “I found 3
restaurants matching ‘downtown.’”
Useless! Siri knows where the
user is, geographically, but does not realize that “downtown” in this context
pertains to location, not a restaurant’s name.
Here, Siri starts to look like a mere forwarder of requests, always
passing the buck to Google instead of applying intelligence to the request.
Simple conversion
of speech to text looks pretty good on Siri.
The reviewer dictated a message to it, and almost everything came out.
The notable exception was how Siri transcribed the reviewer’s spoken
comment “I need to make some videos about the iPhone 4S.” Siri typed, “I need to make some videos about
the iPhone 4 ass.” The reviewer doesn’t
notice this gaff, telling the YouTube viewer, “There it is. It figured out exactly what I wanted to say.”
Dangerous, don’t you think? What
if the reviewer meant to e-mail the text “S as in Sam” but actually e-mailed
“ass as in Sam,” to his boss, Sam?
Not that
Siri doesn’t try hard. When the reviewer
says, “Set a timer for 3 minutes,” Siri replies, “OK, I started a three-minute
timer. Don’t overcook that egg.” Not bad.
Actually, it is bad. For one
thing, “that egg,” when spoken by Siri, comes out “ditek.” Without the text on the screen you’d never
understand what it said. Meanwhile, it’s
obvious that Siri is trying to be funny, and completely failing. There’s nothing witty about Siri making a
lame guess as to what the timer is for.
What’s worse, Siri could create the impression that three minutes is actually
how long you should cook an egg. In fact that’s not nearly enough time,
and everybody knows an undercooked egg presents a salmonella risk.
A fundamental problem
Of course
I’m nitpicking with the egg timer example, and (to a lesser extent) with the
“ass” example, but they bring up an important point: language, as one of the primary interfaces between
humans, requires far more than just understanding what is heard and forming
sentences in response. Having a
sanity-check reflex that keeps you from using words like “ass” in mixed
company, and knowing whether your joke is actually funny, are complicated
processes. Verbal communication can be a
minefield, especially for a computer application that stabs around in the dark.
Consider,
for example, the old joke about the Texan who gets into Harvard. While touring the campus, he asks a student,
“Excuuuse me, can you tell me where the library’s at?” The student replies haughtily, “Here at
Haaarvard, we never end a sentence with a preposition.” The Texan replies, “Okay, can ya tell me
where the library’s at, asshole?”
Upon
inspection, this exchange, though brief, is quite complex. The Harvard student’s response to the Texan’s
query shows a decision that might not occur to an AI application—that is, to a)
not answer the question, and b) use the opportunity to deliver a scornful
message about class and intellect. The
Texan’s comeback makes a statement about a) his refusal to be cowed, b) the
difference between cultivation and innate intelligence. Meanwhile, the joke as a whole counts on the
listener enjoying an opportunity to feel superior to both Harvard students and
Texans, while exulting in the surprise and wit of the punch line. Worlds away from “Enjoy ditek.”
Maybe you
think I’m overreaching here, that such nuance will never be expected of
AI. Maybe AI is just a tool to make
machines more useful to humans, and little gaffs don’t matter much. When a woman asks her husband, “Do these
pants make my butt look fat?” he is instantly plunged into a terribly
complicated interaction, because of his relationship to the woman. So much hinges on his response. If he says “yes” he’s obviously dead. If he says “no” too vociferously, he seems
patronizing. He could try the
reverse-psychology approach and say, “No, your butt makes your butt look fat,” but she better have a sense of
humor and thick skin. Or, he could
ignore the question, or say, “Look, krill!”
Or he could say “yeahhh” lecherously (note that imparting this single
syllable with the sense of “I want some of that!”
is far beyond the current state of the art in AI voice synthesis). But when a human asks Siri “Am I fat?”
and gets back, “Here’s your a.m. alarm” and “I found 8 fitness centers fairly
close to you,” he or she can more easily blow it off.
This
idea—that computers don’t have to play nice when “talking” to humans—is
strongly supported by a scene in “The Terminator” when the evil cyborg,
confronted by his landlord—“Hey buddy, you got a dead cat in there, or what?”—scans
through a menu of possible responses—“YES/NO; OR WHAT; GO AWAY; PLEASE COME
BACK LATER; FUCK YOU, ASSHOLE; FUCK YOU”—and chooses the penultimate one. Of course when you’re the size of Arnold
Schwarzenegger you don’t have to have a friendly user interface.
That said, I
would argue that, to the extent humans are to embrace AI when using electronic
devices, precision and nuance do matter.
We have to trust these devices not to turn “S” into “ass,” not to waste
our time with lists of restaurants we’d never eat at, and not to infuriate us
with messages like “cannot undo.” Even
if you’ve never found yourself yelling profanities at your computer, I’m sure
you’ve seen others do it.
Consider
this cautionary tale. My dad bought one
of the first consumer-oriented computers in history, the Hewlett-Packard Model 85. This was 1980, a year before the IBM
PC. the HP-85 was about as far from Siri
(or at least the design intent of Siri) as you can get. There was no software for it; you had to
program it yourself. Meanwhile, its
version of BASIC was proprietary, diverging from the industry standard (e.g.,
you used the command “DISP” instead of “PRINT”). I had my brother try out one of my first
programs. It prompted him to type his
name. With great hesitation—he was
greatly fearful of doing something wrong and damaging our dad’s expensive machine—he
typed “Max.” Then he sat there waiting
for something to happen. Nothing did,
because my program didn’t say anything about hitting the Enter key when done. Max looked a bit nervous. “It’s not working! It’s not doing anything!” he cried. I told him to hit Enter. When he did, the computer promptly displayed
the message “Max is a jerk” (the whole point of my program). Max got really angry and flustered and to this
day does not use a computer. This probably
isn’t just because of my program; the HP-85 was less than user-friendly and
doubtless gave Max the wrong impression of where home computing was going.
Translation
Here is
where the AI picture is, to me, much rosier.
Early attempts at translation, like Alta Vista’s Babelfish, were a
joke. You pasted the foreign-language
text into a window, gave it the language to translate it into, and then were
presented with a salad of translated words (with un-translated ones sprinkled
like croutons) that made no sense at all.
The only real use for this tool was translating things into Tristan.
What’s
Tristan? Well, I used to have a
colleague, a computer programmer, whose native-tongue language skills were so
poor it was impossible to understand a thing he wrote. His e-mails always gave my colleagues and me
a laugh, and in his honor we invented a language and named it after him. (It wasn’t really called Tristan, because his
last name wasn’t really Tristan; I’ve changed it to protect him from possible
embarrassment.) To translate something
into Tristan, you’d type normal text, translate it into French using Babelfish,
and then translate it back to English.
The results were pure comedy, with not a shred of sense left intact.
I think
people are naturally forgiving of poor translation, because we’ve studied
grammar and foreign languages in school and can really appreciate how difficult
a task this is. Plus, the results are so
often funny, they put us in a good mood.
Consider the urban legend that “Coca-Cola,” when first translated into Chinese, came out meaning “bite
the wax tadpole.” (To this day I’ll
complain about something by saying it bites the wax tadpole.) Brian Hayes, writing in “American Scientist,”
makes an interesting comment about AI efforts to parse grammatical
constructions when translating text: “The
failure of this approach is sometimes dramatized with the tale of the English→
Russian→ English translation that began with ‘The spirit is willing but the
flesh is weak’ and ended with ‘The vodka is strong but the meat is
rotten.’”
More
recently, online translation engines such as Google Translate have gotten much,
much better. As Hayes describes, “The
idea is to ignore the entire hierarchy of syntactic and semantic structures—the
nouns and verbs, the subjects and predicates, even the definitions of words—and
simply tabulate correlations between words in a large collection of bilingual
texts.” At first, this strikes me as a
“brute force” approach that is further from artificial intelligence than earlier
efforts, however hapless, to actually parse a sentence grammatically. But as Hayes points out, the modern technique
is actually lot closer to how humans learn to talk. (It’s also more similar to how we would learn
a foreign language if we had the good fortune to go live in another country,
versus making our way with a textbook and classes.)
I first
tried Google Translate when I was trying to track a package that was being
shipped to me from a web merchant in France.
I have studied French for years, but understanding statements about
logistics and customs offices would be difficult in any language. I was presented with this: “Votre colis est sorti du bureau
d'échange. Il est en cours d'acheminement
dans le pays de destination.” This would
have totally tripped up the original Babelfish, but Google served up an
entirely comprehensible translation: “Your
package is out of the office of exchange. It is in transit in the country of
destination.” (Not only was I delighted
with how clear this was, I was relieved my package wasn’t stuck in
customs.) Translating this English back
into French, and then back into English, I get “Your package is out of the
office of exchange. It is in transit to the destination country.” Very little of the “Tristan effect.” (There’s some fuzziness around “to” vs. “in” with
regard to the destination country, but I can live with that.)
To reassure
myself then the Man of Letters wouldn’t be replaced by a machine anytime soon,
I tried some poetry:
But the Raven still beguiling all my sad soul into smiling,
Straight I wheeled a cushioned seat in front of bird and bust and door;
Then, upon the velvet sinking, I betook myself to linking
Fancy unto fancy, thinking what this ominous bird of yore—
What this grim, ungainly, ghastly, gaunt, and ominous bird of yore
Meant in croaking “Nevermore.”
When I fed
this into the new version of Babelfish (which works similarly to Google’s), and
translated it into French and back, the response was this:
But the Raven seductive yet all my sad soul into smiling,
Straight I wheeled a seat padded before the bird and bust and door;
Then, on the Velvet sinking, I hauled myself to tie Fancy: fancy,
Think what this bird threatening of antan - the sad bird, awkward, frightening,
Ghent and disturbing past Meant in croaking “Nevermore.”
Aha! Gibberish!
I was about to feel all smug about the superiority of humans over AI,
but then tried Google Translation with the same English à French à English task:
But the raven still beguiling all my sad soul into smiling,
I wheeled a cushioned seat in front of bird and bust and door;
Then, upon the velvet sinking me, I betook myself to linking
Fancy unto fancy, thinking what this ominous bird of yore -
What this grim, ungainly, ghastly bird, gaunt, and ominous of yesteryear
Meant in croaking “Nevermore.”
Wow. That’s so good it’s creepy. But before you despair and decide the
computers will ultimately render the human race unnecessary, be sure to check
out my next albertnet post, wherein I examine how well AI does playing games—another
classic measure of its progress.
No comments:
Post a Comment