Praise for Building Machine Learning Pipelines
I wish this book had existed when I started working in production ML! Its an outstanding resource for getting a comprehensive view of production ML systems in general, and TFX in particular. Hannes and Catherine have worked directly with the TensorFlow team to get the most accurate information available for including in this book, and then explained it in clear, concise explanations and examples.
Robert Crowe, TensorFlow Developer Advocate, Google
The data science practitioner knows that real-world machine learning involves more than just machine learning model training. This book demystifies the hidden technical debt in modern machine learning workflows such that you can put the lab and factory data science patterns into production as repeatable workflows.
Josh Patterson, CEO, Patterson Consulting, Coauthor of Deep Learning: A Practitioners Approach
and Kubeflow Operations Guide
This is definitely the book to read if you would like to understand how to build ML pipelines that are automated, scalable, and reproducible! You will learn something useful from it whether you are a data scientist, machine learning engineer, software engineer, or DevOps. It also covers the latest features of TFX and its components.
Margaret Maynard-Reid, Machine Learning Engineer, Tiny Peppers, ML GDE (Google Developer Expert), GDG Seattle Lead Organizer
Wonderfully readable, Building Machine Learning Pipeline serves not only as a comprehensive guide to help data scientists and ML engineers build automated and reproducible ML pipelines, but it is also the only authoritative book on the subject. The book provides an overview of the clearly defined components needed to architect ML pipelines successfully and walks you through hands-on code examples in a practical manner. "
Adewale Akinfaderin, Data Scientist, Amazon Web Services
I really enjoyed reading Building Machine Learning Pipelines. Having used TFX for several years internally at Google as it was growing, I must say I wish I had your book back then instead of figuring this all out on my own. You would have saved me many months of effort and confusion. Thanks for writing such a high quality guide!
Lucas Ackerknecht, Machine Learning Specialist, Anti-Abuse Machine Learning, Google
We all have some of these amazing prototype models lying around. This book will introduce you to the tools and techniques that will help you take that prototype to production. Not only that but you will also build a complete end-to-end pipeline around it so that any future enhancements get delivered automatically and smoothly. This is a great book for beginners in ML ops who want to take their skills to the next level and collaborate with larger teams to help realize the values of innovative new models.
Vikram Tiwari, Cofounder, Omni Labs, Inc.
As a person who had only used TensorFlow as a framework for training deep learning models, when reading this book I was amazed at the pipeline capabilities that the TensorFlow ecosystem has to offer. This book is a great guide to all of the tools for analyzing and deploying models available with TFX, and is easy to read and use for people looking to make their first machine learning pipeline with TensorFlow.
Dr. Jacqueline Nolis, Principal Data Scientist, Brightloom and Coauthor of Build a Career in Data Science
This book is an exceptional deep-dive into Machine Learning Engineering. You will find cogent and practical examples of what it takes to build production-ready ML infrastructure. I would consider this required reading for any engineer or data scientist who intends to apply ML to real-world problems.
Leigh Johnson, Staff Engineer, Machine Learning Services, Slack
Building Machine Learning Pipelines
by Hannes Hapke and Catherine Nelson
Copyright 2020 Hannes Hapke and Catherine Nelson. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
Acquisitions Editor: Jonathan Hassell | Indexer: Ellen Troutman-Zaig |
Developmental Editors: Amelia Blevins, Nicole Tach | Interior Designer: David Futato |
Production Editor: Katherine Tozer | Cover Designer: Karen Montgomery |
Copyeditor: Tom Sullivan | Illustrator: Rebecca Demarest |
Proofreader: Piper Editorial, LLC |
- August 2020: First Edition
Revision History for the First Edition
- 2020-07-13: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492053194 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Building Machine Learning Pipelines, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-492-05319-4
[LSI]
Foreword
When Henry Fords company built its first moving assembly line in 1913 to produce its legendary Model T, it cut the time it took to build each car from 12 to 3 hours. This drastically reduced costs, allowing the Model T to become the first affordable automobile in history. It also made mass production possible: soon, roads were flooded with Model Ts.
Since the production process was now a clear sequence of well-defined steps (aka, a pipeline), it became possible to automate some of these steps, saving even more time and money. Today, cars are mostly built by machines.
But its not just about time and money. For many repetitive tasks, a machine will produce much more consistent results than humans, making the final product more predictable, consistent, and reliable. Lastly, by keeping humans away from heavy machinery, safety is greatly improved, and many workers went on to perform higher-level jobs (although to be fair, many others just lost their jobs).
On the flip side, setting up an assembly line can be a long and costly process. And its not ideal if you want to produce small quantities or highly customized products. Ford famously said, Any customer can have a car painted any color that he wants, so long as it is black.
The history of car manufacturing has repeated itself in the software industry over the last couple of decades: every significant piece of software nowadays is typically built, tested, and deployed using automation tools such as Jenkins or Travis. However, the Model T metaphor isnt sufficient anymore. Software doesnt just get deployed and forgotten; it must be monitored, maintained, and updated regularly. Software pipelines now look more like dynamic loops than static production lines. Its crucial to be able to quickly update the software (or the pipeline itself) without ever breaking it. And software is much more customizable than the Model T ever was: