Final Jeopardy
Man vs. Machine and the Quest to Know Everything
Stephen Baker
Acknowledgments
A year ago, I was anxiously waiting for a response to a book proposal. I had high hopes for it, and was disappointed when my marvelous editor at Houghton Mifflin Harcourt, Amanda Cook, told me to look for another project. Wed find something better, she said. It turned out she was right. Im thankful for her guidance in this book. Shes had a clear vision for it all along. Her notes in the margins of the manuscript are snippets of pure intelligence. Not long ago I scanned one of these Amanda-infested pages and e-mailed it to a few friends just to show them how a great editor worksand how fortunate I am to have one.
I applaud the entire team at Houghton, which turned itself inside out to publish this book on a brutal schedule and to innovate with the e-book. If it had settled for the lollygagging schedule I outlined in my proposal, this book would be showing up in stores six months after Watsons televised Jeopardy match. Thanks to Laura Brady, Luise Erdmann, Taryn Roeder, Ayesha Mizra, Bruce Nichols, Lori Glazer, Laurie Brown, Brian Moore, Becky Saikia-Wilson, Nicola Fairhead, and all the other people at Houghton who helped produce this book in record time. Thanks also to my wonderful agent, Jim Levine, and the entire team at Levine-Greenberg.
I remember calling Michael Loughran at IBM on a winter evening and suggesting that this Jeopardy machine might make a good book. He was receptive that night, and remained so throughout. He was juggling four or five jobs at the same time and tending to a number of constituencies, from the researchers in the IBMs War Room to the various marketing teams in Manhattan and the television executives in Culver City. Yet he found time for me and made this book possible. Thanks, too, to his colleagues at IBM, including Scott Brooks, Noah Syken, Ed Barbini, and my great friend and former BusinessWeek colleague Steve Hamm. I also appreciate the help and insights from the team at Ogilvy & Mather, especially David Korchin and Miles Gilbert, who brought Watsons avatar to life for me.
The indispensable person, of course, was David Ferrucci. If its not clear in the book how open, articulate, and intelligent he is, I failed as a writer. He was my guide, not only to Watsons brain, but to the broader world of knowledge. He was generous with his time and his team. Im thankful to all of them for walking me through every aspect of their creation. My questions had to try their patience, yet they never let it show.
Harry Friedman welcomed me to the fascinating world of Jeopardy and introduced me to a wonderful cast of characters, including Rocky Schmidt and the unflappable Alex Trebek. Thanks to them all and to Grant Loud, who was always there to answer my calls. I owe a load of New Jersey hospitality to my California hosts, Natalie and Jack Isquith, and my niece Claire Schmidt.
Scores of people, in the tech world and academia, lent me their expertise and their time. Im especially grateful to my friends at Carnegie Mellon for opening their doors to me, once again, and to MIT. Thanks, too, to Peter Norvig at Google, Prasanna Dhore at HP, Anne Milley at SAS, and the sharpest mind I know in Texas, Neil Iscoe.
And for her love, support, and help in maintaining a sense of balance, I give thanks to my wife, Jalaire. Shed see the forty Jeopardy shows stored on TiVo and say, Lets watch something else.
Notes
It was a September morning: Like Yahoo! and a handful of other businesses, the official name of the quiz show in this story ends in an exclamation point: Jeopardy! Initially, I tried using that spelling, but I thought it made reading harder. People see a word like this! and they think it ends a sentence. Since I use the name Jeopardy more than two hundred times in the book, I decided to eliminate that distraction. My apologies to the Jeopardy! faithful, many of whom are sticklers for this kind of detail.
pressing the button: A few months before the final match, I was talking to the Jeopardy champion Ken Jennings in Los Angeles. Discussing Watson, he suddenly stopped himself. What do you call it? he asked. Him? It? The question came up all the time, and even among the IBM researchers the treatment wasnt consistent. When they were programming or debugging the machine, they naturally referred to it as a thing. But when Watson was playing, it would turn into a he. And occasionally David Ferrucci was heard referring to it as I. In the end, I opted for calling the machine it. Thats what it is, after all.
He was the closest thing: For narrative purposes, I focused on a handful of researchers in the Jeopardy project, including Jennifer Chu-Carroll, James Fan, David Gondek, Eric Brown, and Eddie Epstein. But they worked closely with groups of colleagues too numerous to mention in the telling of the story. Here are the other members of IBMs Jeopardy challenge team: Bran Boguraev, Chris Welty, Adam Lally, Anthony (Tony) Levas, Aditya Kalyanpur, James (Bill) Murdock, John Prager, Michael McCord, Jon Lenchner, Gerry Tesauro, Marshall Schor, Tong Fin, Pablo Duboue, Bhavani Iyer, Burn Lewis, Jerry Cwiklik, Roberto Sicconi, Raul Fernandez, Bhuvana Ramabhadran, Andrew Rosenberg, Andy Aaron, Matt Mulholland, Karen Ingraffea, Yuan Ni, Lei Zhang, Hiroshi Kanayama, Kohichi Takeda, David Carmel, Dafna Sheinwald, Jim De Piante, and David Shepler.
most books had too many words: For more technical details on the programming of Watson, see AI Magazine (vol. 31, no. 3, Fall 2010). The entire issue is devoted to Q-A technology and includes lots of information about the Jeopardy project.
smarter Watson wouldnt have: One of the reasons the fast version of Watson is so hard to manage and update is its data. In order to speed up the machines processing of its 75 gigabytes of data, the IBM team processed it all beforehand. This meant that instead of the machine figuring out on the fly the subjects and objects of sentences, this work was done in advance. Watson didnt need to parse a sentence to conclude that the apple fell on Isaac Newtons head and not vice versa. Looking at it from a culinary perspective, the researchers performed for Watson the job that pet food manufacturers like Purina carry out for animals: They converted a rich, varied, and complex diet into the informational equivalent of kibbles. When we want to run a question, Ferrucci said, the evidence is already analyzed. Its already parsed. The people are found, the locations are found. This multiplied Watsons data load by a factor of 6to 500 gigabytes. But it also meant that to replicate the speed of Watson in other domains, the data would likely have to be already processed. This makes answering machines less flexible and versatile.
a huge knowledge base: NELL has a human-instructed counterpart. Called Cyc, its a universal knowledge base painstakingly assembled and organized since 1984 by Cycorp, of Austin, Texas. In its scope, Cyc was as ambitious as the eighteenth-century French encyclopedists, headed by Denis Diderot, who attempted to catalogue all of modern knowledge (which had grown significantly since the days of Aristotle). Cyc, headed by a computer scientist named Douglas Lenat, aspired to fill a similar role for the computer age. It would lay out the relationships of practically everything, from plants to presidents, so that intelligent machines could make inferences. If they knew, for example, that Ukraine produced wheat, that wheat was a plant, and that plants died without water, it could infer that a historic drought in Ukraine would curtail wheat production. By 2010, Cyc has grown to nearly half a million terms, from plants to presidents. It links them together with some fifteen thousand types of relations. A squirrel, just to pick one example, has scores of relationships: trees (climbed upon), rats (cousins of), cars (crushed by), hawks (hunted by), acorns (food), and so on. The Cyc team has now accumulated five million facts, or assertions, relating all of the terms to one another. Cyc represents more than six hundred researcher-years but is still limited in its scope. And in the age of information, the stratospheric growth of knowledge seems sure to outstrip the efforts of humans to catalogue it manually.