Praise for The Elements of Voice First Style
Rare is a book that can teach beginners and experts so deeply and practically. These lessons will grow alongside voice technology for years to come.
Julia Anderson, conversation designer & writer
You knew the What, now heres the detailed step-by-step How to do Conversation Design. A first!
Maria Aretoulaki, principal consultant CX design (voice & conversational AI) at GlobalLogic and director at DialogCONNECTION
The Elements of Voice First Style establishes the foundations for a new wave of applications designed to truly delight users. Fortunately, technology has finally enabled the nuance and sophistication that Bouzid and Ma so artfully postulate.
Corey Miller, ASR research manager at Rev.com
As a long-time specialist in conversational technologies, Ive often asked how the principles espoused in The Elements of Style , the venerated writers guide by William Strunk, Jr., and E.B. White, could be adopted by designers of voicebots and Voice First applications. This practical guide by Bouzid and Ma is the answer to that question. It is an homage to Strunk and White that provides a very accessible, yet comprehensive set of guidelines for aspiring designers for intelligent voice assistants.
Dan Miller, founder of Opus Research
The Elements of Voice First Style: A Practical Guide to Voice User Interface Design is informative, brilliant, and a must read for those in the industry to those wanting to learn from the best!
Audrey Arbeeny, CEO/founder/executive producer at Audiobrain
The book offers precious voicebot design best practices!
Giorgio Robino, conversational AI technical leader at Almawave.it
If you are building voice-based apps, this book is a must read. It shares the essential fundamentals for beginners and practical guidance for experts who are interested in gaining a deeper understanding of building high quality voicebots.
Rajiv Bammi, senior engineering leader
Voice first technology needs honest perspectives to show the way forward. Ahmed and Weiyes book provides exactly that; new ways to think about old problems, how to make improvements, when voice isnt a good solution, and whats wrong with the status quo. Burst the hype bubbleread this book!
Benjamin McCulloch, conversation designer (with audio super powers)
The Elements of Voice First Style
by Ahmed Bouzid and Weiye Ma
Copyright 2022 Ahmed Bouzid and Weiye Ma. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
Acquisitions Editor: Amanda Quinn | Indexer: Ellen Troutman-Zaig |
Development Editor: Jill Leonard | Interior Designer: David Futato |
Production Editor: Kate Galloway | Cover Designer: Karen Montgomery |
Copyeditor: nSight, Inc. | Illustrator: Kate Dullea |
Proofreader: Amnet Systems LLC |
Revision History for the First Edition
- 2022-05-16: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098119591 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. The Elements of Voice First Style, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Weiye Mas affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITREs concurrence with, or support for, the positions, opinions, or viewpoints expressed by the author.
978-1-098-11959-1
[LSI]
To our parents.
Preface
This book has been almost twenty years in the making. During those years, the running line among practitioners in the speech technology field was, and for many still remains: Speech is just around the corner. Meaning, by this time next year, God willing, speech technology will finally deliver on its promise and at long last be adopted as a reliable way for humans to retrieve and create information, as well as do other things; instead of typing, pushing buttons, tapping and swapping, they will just speak and listen.
In the early days, the proposition that Speech is just around the corner was an earnest aspiration. There was exuberance (this was the 1990s after all) and, for the most part, the prediction was hope-filled. In hindsight, the proposition looks almost irrational, given the state of the technologys usability at the time, its cost, and its basic performance (slow and inaccurate). But then, as the years wore on, the prediction turned into a healthy mix of self-deprecation (How could we have been so arrogant?), stubborn defiance (But, we will make it happen!), and a sober aversion to anything that smacks of hype (And when it does seem to be happening, we will keep our skeptical eyes wide open).
In voice telephony systems, otherwise known as the unloved interactive voice response (IVR) applications, humans call a phone number intending to speak to another human only to be unpleasantly met by a system that tries to speak and listen. Those were the first interactive speech technologies deployed for mainstream use and eventually did go mainstream in the early 2000s. And although they did deliver undeniable value, notwithstanding the justified grousing from users, they somehow didnt count as the fulfillment of the speech is just around the corner aspiration. It was not until the launch of the iPhone 4S on October 4, 2011 (one day before the death of Steve Jobs), that one could arguably say that speech had arrived: Siri was born and interactive speech was now available, on demand, for the tens of millions of people who owned an iPhone at the time.
The arrival of Siri was a watershed moment not only because interactive speech was now available on a smartphone, but also because the type of speech-based interactions that it delivered were fundamentally different from the ones that users were encountering in IVR systems. The key difference lies in the fact that when someone calls a phone number, they usually want to speak with a human being, and when they encounter an automated system instead, they have to decide whether or not they want to interact, immediately ask for an agent (or zero-out), or hang up. With Siri, in contrast, the user is willingly engaging in self-service speech automation. When they press and hold the home button (as the first interactions with Siri had the user do), they are not expecting to speak to a human; they