inside front cover
Automated Machine Learning in Action
Qingquan Song
Haifeng Jin
Xia Hu
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
www.manning.com
Copyright
For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
2022 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Mannings policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
| Manning Publications Co. 20 Baldwin Road Technical PO Box 761 Shelter Island, NY 11964 |
Development editor: | Toni Arritola |
Technical development editor: | Kostas Passadis |
Review editor: | Aleksandar Dragosavljevi |
Production editor: | Andy Marinkovich |
Copy editor: | Pamela Hunt |
Proofreader: | Keri Hales |
Technical proofreaders: | Karsten Strbaek, Ninoslav erkez |
Typesetter: | Dennis Dalinnik |
Cover designer: | Marija Tudor |
ISBN: 9781617298059
front matter
preface
The goal of automated machine learning (AutoML) is to make machine learning (ML) accessible to everyone, including physicians, civil engineers, material scientists, and small business owners, as well as statisticians and computer scientists. This long-term vision is very similar to that of Microsoft Officeenabling normal users to easily create documents and prepare reportsand cameras in smartphones, facilitating convenient photos taken from anywhere at any time. Although the ML community has devoted a lot of R&D efforts to pursuing this goal, through our collaboration with domain experts and data scientists, we determined that there is a high demand to reveal the magic behind AutoML, including fundamental concepts, algorithms, and tools.
To begin, we would like to share several steps that got us here. (Okay, now you can skip to our main content if you want, but hey, who doesnt like a good story?)
We started our data science and ML journey many years ago and have been researching and developing ML algorithms and systems from scratch ever since. In the early days, we were tortured, like many of you, by complicated equations, unstable results, and hard-to-understand combinations of hyperparameters. Later, more and more advanced algorithms were developed, and open source implementations became available. Unfortunately, training an effective machine learning/deep learning model is still very much like alchemy, and it takes years of training to become a capable alchemist... yes, we are certified alchemists.
Over the years, we were approached by many domain experts who wanted to try out the magical tool called machine learning because of its premier performance on many tasks (or simply because everyone was talking about it). Not surprisingly, it worked well on many datasets and improved traditional rule-based or heuristic-based methods. After working with many people with similar tasks again and again (classification, clustering, and prediction), we were not only tired of applying ML tools but also felt strongly that we could do something to democratize ML for all. AutoML, here we go!
Since then, we have worked on a project called Data-Driven Discovery of Models (D3M), supported by DARPA, and initiated the open source project, AutoKeras. We were happy to see many people interested in our developed software, and they provided copious positive and harsh feedback on the tools we developed. At the same time, we got the chance to know and collaborate with great researchers and engineers working on similar problems. Everything was going in the right direction!
Our vision evolved as we worked with more and more data scientists and ML engineers. Initially, we wanted to just help people quickly make use of ML with a few lines of code, but we gradually realized, as we were facing too many downstream tasks and problems, that we had a long way to go to achieve this goal. What was most urgent was that many practitioners were working on their own AutoML systems, which could run well with their own internal, small-scale problems, such as automated outlier detection, automated recommender systems, and automated feature engineering. Our goal then became making ML accessible to everyone. Oops! This seemed to be the same as our original plan! To better achieve the goal, we decided to spend a big chunk of our time writing this book to help you make better use of, and easily develop, AutoML tools.
We hope you enjoy the book and look forward to your feedback!
acknowledgments
We would like to thank everyone who helped us while we wrote this book, without whom it would not have been possible. The first person on this list is Franois Chollet. He not only provided valuable guidance and feedback on the content of our book, but also made major contributions to the design and implementation of KerasTuner and AutoKeras, which made these libraries so delightful to use. We also really appreciate his amazing work with Keras, which laid a solid foundation for the hyperparameter tuning and AutoML work to build upon.
Thank you to all the open source contributors to KerasTuner and AutoKeras who provided valuable feedback, and even code contributions, to these open source libraries. Although we have not met all of you, your code became an indispensable part of this large ecosystem, which has helped thousands (or maybe even millions) of people.
We send a heartfelt thank you to our lab mates from DATA Lab at Texas A&M University, who helped us during the writing of this book. We are especially grateful for Yi-Wei Chen, who helped us write the examples in chapter 9, which made this book even better.
To all the reviewers: Alain Couniot, Amaresh Rajasekharan, Andrei Paleyes, David Cronkite, Dewayne Cushman, Didier Garcia, Dimitris Polychronopoulos, Dipkumar Patel, Gaurav Kumar Leekha, Harsh Raval, Howard Bandy, Ignacio Ruiz, Ioannis Atsonios, Lucian Mircea Sasu, Manish Jain, Marco Carnini, Nick Vazquez, Omar El Malak, Pablo Roccatagliata, Richard Tobias, Richard Vaughan, Romit Singhai, Satej Kumar Sahu, Sean Settle, Sergio Govoni, Sheik Uduman Ali M, Shreesha Jagadeesh, Stanley Anozie, Steve D Sussman, Thomas Joseph Heiman, Venkatesh Rajagopal, Viton Vitanis, Vivek Krishnan, Walter Alexander Mata Lpez, Xiangbo Mao, and Zachery Beyel, your careful reviews and suggestions gave us the incentive to keep polishing the book.