Bayesian Speech and Language Processing
With this comprehensive guide you will learn how to apply Bayesian machine learning techniques systematically to solve various problems in speech and language processing.
A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n -gram models, and latent topic models along with applications including automatic speech recognition, speaker verification, and information retrieval. Approximate Bayesian inferences based on MAP, Evidence, Asymptotic, VB, and MCMC approximations are provided as well as full derivations of calculations, useful notations, formulas, and rules.
The authors address the difficulties of straightforward applications and provide detailed examples and case studies to demonstrate how you can successfully use practical Bayesian inference methods to improve the performance of information systems.
This is an invaluable resource for students, researchers, and industry practitioners working in machine learning, signal processing, and speech and language processing.
Shinji Watanabe received his Ph.D. from Waseda University in 2006. He has been a research scientist at NTT Communication Science Laboratories, a visiting scholar at Georgia Institute of Technology and a senior principal member at Mitsubishi Electric Research Laboratories (MERL), as well as having been an associate editor of the IEEE Transactions on Audio Speech and Language Processing , and an elected member of the IEEE Speech and Language Processing Technical Committee. He has published more than 100 papers in journals and conferences, and received several awards including the Best Paper Award from IEICE in 2003.
Jen-Tzung Chien is with the Department of Electrical and Computer Engineering and the Department of Computer Science at the National Chiao Tung University, Taiwan, where he is now the University Chair Professor. He received the Distinguished Research Award from the Ministry of Science and Technology, Taiwan, and the Best Paper Award of the 2011 IEEE Automatic Speech Recognition and Understanding Workshop. He serves currently as an elected member of the IEEE Machine Learning for Signal Processing Technical Committee.
This book provides an overview of a wide range of fundamental theories of Bayesian learning, inference, and prediction for uncertainty modeling in speech and language processing. The uncertainty modeling is crucial in increasing the robustness of practical systems based on statistical modeling under real environment, such as automatic speech recognition systems under noise, and question answering systems based on limited size of training data. This is the most advanced and comprehensive book for learning fundamental Bayesian approaches and practical techniques.
Sadaoki Furui, Tokyo Institute of Technology
Bayesian Speech and Language Processing
SHINJI WATANABE
Mitsubishi Electric Research Laboratories
JEN-TZUNG CHIEN
National Chiao Tung University
University Printing House, Cambridge CB2 8BS, United Kingdom
Cambridge University Press is part of the University of Cambridge.
It furthers the Universitys mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781107055575
Cambridge University Press 2015
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.
First published 2015
Printed in the United Kingdom by Clays, St Ives plc
A catalog record for this publication is available from the British Library
Library of Congress Cataloging in Publication data
Watanabe, Shinji (Communications engineer) author.
Bayesian speech and language processing / Shinji Watanabe, Mitsubishi Electric Research Laboratories; Jen-Tzung Chien, National Chiao Tung University.
pages cm
ISBN 978-1-107-05557-5 (hardback)
1. Language and languages Study and teaching Statistical methods. 2. Bayesian statistical decision theory. I. Title.
P53.815.W38 2015
410.1 51dc23
2014050265
ISBN 978-1-107-05557-5 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
Preface
In general, speech and language processing involves extensive knowledge of statistical models. The acoustic model using hidden Markov models and the language model using n -grams are mainly introduced here. Both acoustic and language models are important parts of modern speech recognition systems where the learned models from real-world data are full of complexity, ambiguity, and uncertainty. The uncertainty modeling is crucial to tackle the lack of robustness for speech and language processing.
This book addresses fundamental theories of Bayesian learning, inference, and prediction for the uncertainty modeling. Uniquely, compared with standard textbooks for dealing with the fundamental Bayesian approaches, this book focuses on the practical methods of the approaches to make them applicable to actual speech and language problems. We (the authors) have been studying these topics for a long time with a strong belief that the Bayesian approaches could solve robustness issues in speech and language processing, which are the most difficult problem and most serious shortcoming of real systems based on speech and language processing. In our experience, the most difficult issue in applying Bayesian approaches is how to appropriately choose a specific technique among the many Bayesian techniques proposed in statistics and machine learning so far. One of our answers to this question is to provide the approximated Bayesian inference methods rather than focusing on covering the whole Bayesian techniques. We categorize the Bayesian approaches into five categories: the maximum a-posteriori estimation; evidence approximation; asymptotic approximation; variational Bayes; and Markov chain Monte Carlo. We also describe the speech and language processing applications within this categorization so that readers can appropriately choose the approximated Bayesian techniques for their problems.
This book is part of our long-term cooperative efforts to promote the Bayesian approaches in speech and language processing. We have been pursuing this goal for more than ten years, and part of our efforts was to organize a tutorial lecture with this theme at the 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in Kyoto, Japan, March 2012. The success of this tutorial lecture prompted the idea of writing a textbook with this theme. We strongly believe in the importance of the Bayesian approaches, and we sincerely encourage the researchers who work with Bayesian speech and language processing.
Acknowledgments
First we want to thank all of our colleagues and research friends, especially members of NTT Communication Science Laboratories, Mitsubishi Electric Research Laboratories (MERL), National Cheng Kung University, IBM T. J. Watson Research Center, and National Chiao Tung University (NCTU). Some of the studies in this book were actually conducted when the authors were working in these institutes. We also would like to thank many people for reading a draft and giving us valuable comments which greatly improved this book, including Tawara Naohiro, Yotaro Kubo, Seong-Jun Hahm, Yu Tsao, and all of the students from the Machine Learning Laboratory at NCTU. We are very grateful for support from Anthony Vetro, John R. Hershey, and Jonathan Le Roux at MERL, and Sin-Horng Chen, Hsueh-Ming Hang, Yu-Chee Tseng, and Li-Chun Wang at NCTU. The great efforts of the editors of Cambridge University Press, Phil Meyler, Sarah Marsh, and Heather Brolly, are also appreciated. Finally, we would like to thank our families for supporting our whole research lives.