Designing Voice User Interfaces: Principles of Conversational Experiences
Cathy Pearl
Beijing Boston Farnham Sebastopol Tokyo
To my friend Karen Kaushansky, who always encourages me to take the meeting.
Special Upgrade Offer
If you purchased this ebook directly from oreilly.com, you have the following benefits:
DRM-free ebooksuse your ebooks across devices without restrictions or limitations
Multiple formatsuse on your laptop, tablet, or phone
Lifetime access, with free updates
Dropbox syncingyour files, anywhere
If you purchased this ebook from another retailer, you can upgrade your ebook to take advantage of all these benefits for just $4.99. to access your ebook upgrade.
Please note that upgrade offers are not available from sample content.
Praise for Designing Voice User Interfaces
Voice has been core to human interaction since well before history. But what is old is now new: voice is becoming core to how we interact with computers. Pearl has done an brilliant job of distilling her 17 years of experience into a gem of a book. Valuable lessons, clear thinking, and insightful observations frame a core argument about how to design for voice. A completely new approach to an ancient interaction.
MARK STEPHEN MEADOWS, AUTHOR, ARTIST, AND PRESIDENT OF BOTANIC.IO
This book is a great resource for learning the fundamentals of voice user interface design. More and more designers are going to be expected to design usable voice experiences and Designing Voice User Interfaces can help you to learn how to do just that.
CHRIS MAURY, FOUNDER, CONVERSANT LABS
Practical and comprehensive, Cathy Pearls book about VUI design clearly originates from her vast amount of hands-on experience. This book passes on her years of lessons learned so you can start your own adventures with speech interfaces from an advantaged position.
REBECCA NOWLIN GREEN, NUANCE COMMUNICATIONS, BUSINESS CONSULTING
Sharing with lively swagger her lifelong passion for machines that listen and talk, Pearl ushers in the new era of VUI design with impressively broad and practical coverage. Since designing for speech has special challenges and implications that elude even industry insiders, this book promises to be worthwhile as well for business decision-makers and developers who work in this space. With multimodal apps now a cultural fixture, chatbots on the horizon, and virtual assistance making its revival (remember Wildfire and General Magics Portico of the 90s?), this release couldnt be more timely.
JAMES GIANGOLA, CREATIVE LEAD, CONVERSATION DESIGN & DIRECTION, GOOGLE
Pearls Designing Voice User Interfaces is a refreshing and much-needed update on how to design effective VUIs. The book is brimming with practical advice from experts and packed with examples that reference leading-edge technology. The book deserves a place on any VUI designers desk.
JENNIFER BALOGH, PH.D., COAUTHOR OF VOICE USER INTERFACE DESIGN
Preface
W E LIVE IN A MAGICAL TIME . While lounging on my living room sofa, using only my voice I can order a pound of gummy bears to be delivered to my door within two hours. (Whether or not its a good thing that I can do this is a discussion for another book.)
The technology of speech recognitionhaving a computer understand what you say to ithas grown in leaps and bounds in the past few years. In 1999, when I began my career in voice user interface (VUI) design at Nuance Communications, I was amazed that a computer could understand the difference between me saying checking versus savings. Today, you can pick up your mobile phoneanother magical deviceand say, Show me coffee shops within two miles that have WiFi and are open on Sundays, and get directions to all of them.
In the 1950s, when computers were beginning to spark peoples imaginations, the spoken word was considered to be a relatively easy problem. After all, it was thought, even a two-year-old can understand language!
As it turns out, comprehending language is quite complex. Its filled with subtleties and idiosyncrasies that take humans takes years to master. Decades were spent trying to program computers to understand the simplest of commands. It was believed by some that only an entity that lived in the physical world could ever truly understand language, because without context it is impossible to understand the meaning behind the words.
Speech recognition was around in science fiction long before it came to exist in real life. In the 1968 film 2001: A Space Odyssey , the HAL 9000 unit is an intelligent computer that responds to voice commands (although it didnt always do what was asked). The movie, and HAL 9000, made a strong impression on moviegoers. Even now, people like to test VUIs and chatbots with the famous line, Open the pod bay doors, HAL.
In the movie Star Trek IV: The Voyage Home (1986), the crew of the Enterprise travels back in time to 1986, and when Chief Engineer Scotty is given a computer to work with, he addresses it by voice, saying Computer! When the computer doesnt respond, Doctor McCoy hands him the mouse, which Scotty attempts to use as a microphone. Finally, when told to use the keyboard, he comments, How quaint. No doubt someday keyboards really will seem quaint, but were not there yet. However, were as close to the science fiction of voice recognition as weve ever been. In 2017, online retailer ThinkGeek will release a Star Trek ComBadge: just like in the TV series from the 1980s, it allows users to tap the badge and speak voice commands, which are sent via Bluetooth to your smartphone.
I find the existence of this product quite significant. Although telephone-based speech systems have been around for 20 years and mobile phone VUIs for almost 10, this badge signifies coming full circle to the original vision of what voice technology could truly offer. Its life imitating imagination.
Why Write This Book?
So, if were already thereif were already at Star Trek levels of humancomputer voice interactionswhy do we need this book?
If you have ever had difficulty with a poorly designed thermostat, or turned on the wrong burner on a stove (I personally still do this with my own stove after 13 years of use), or tried to pull on a door when it should have been pushed,[] you know that without good design, technology is difficult or even impossible to use.
Having speech recognition with high accuracy only solves part of the problem. What do you do with this information? How do you go from recognizing the words to doing what someone actually wants?
The ability of todays smartphones to understand what you say and then act on it is a combination of two important technologies: automated speech recognition (ASR) and natural-language understanding (NLU). If someone spoke to you in a language you didnt understand, you could probably write down, phonetically, what they said. Thats the ASR piece. But you would have no idea what it meant.
One of the most important aspects of good VUI design is to take advantage of known conversational principles. Your users have been speaking out loud and engaging in conversations with others since they were toddlers. You can ask a young child, Please get the green ball out of the red box and bring it to me, and she knows you mean the ball, not the box (this is called