Praise for Reliable Machine Learning
I dont care how much data science work youve done in the past, or how expert you are on the statistical foundations of machine learning. I dont care if you have read every line of the Tensorflow source code, or implemented your own distributed ML training from scratch. Before you ever put a real system based on machine learning into deployment, you will benefit from reading this book. This is what is needed for the thousands of upcoming ML deployments where their usefulness is a double-edged sword. The more useful, the higher the stakes around safety, security, paying customers who are counting on you, fairness, or policy decisions that will be made on the basis of your system. This book thoroughly surveys the operations you need to be running if you have this level of responsibility, and you can rest assured that it comes from combined decades of hard-won experience.
Andrew Moore, VP and General Manager Google Cloud AI
MLOps wouldnt be nearly as painful if we, the people who do machine learning, applied software engineering best practices. This is a well-written and comprehensive book on these engineering best practices from some of the worlds top experts.
Chip Huyen, author of Designing Machine Learning Systems
Reliable Machine Learning is a must-read for people building real-world machine learning systems. It provides a blueprint for thinking about the complex and nuanced issues of developing machine learning enabled products.
Brian Spiering, Data Science Instructor
Reliable Machine Learning
by Cathy Chen , Niall Richard Murphy , Kranti Parisa , D. Sculley , and Todd Underwood
Copyright 2022 Capriole Consulting Inc., Niall Richard Murphy, Kranti Parisa, D. Sculley, and Todd Underwood. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Acquisitions Editor: John Devins
- Development Editor: Angela Rufino
- Production Editor: Ashley Stussy
- Copyeditor: Sharon Wilkey
- Proofreader: Charles Roumeliotis
- Indexer: nSight, Inc.
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Kate Dullea
- September 2022: First Edition
Revision History for the First Edition
- 2022-09-19: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098106225 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Reliable Machine Learning, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-098-10622-5
[LSI]
Foreword
Machine learning (ML) is at the heart of a tremendous wave of technological innovation that has only just begun. Picking up where the data-driven wave of the 2000s left off, ML enables a new era of model-driven decision making that promises to improve organizational performance and enhance customer experiences by allowing machines to make near-instantaneous, high-fidelity decisions, at the point of interaction, based on the most current information available.
To support the productive use of ML models, the practice of machine learning has had to evolve rapidly from a primarily academic pursuit to a fully fledged engineering discipline. What was once the sole domain of researchers, research scientists, and data scientists is now, at least equally, the responsibility of ML engineers, MLOps engineers, software engineers, data engineers, and more.
Part of what we see in the evolution of machine learning roles is a healthy shift in focus from simply trying to get models to work to ensuring that they work in a way that meets the needs of the organization. This means building systems that allow the organization to produce and deliver them efficiently, hardening them against failure, enabling recovery from any failures that do happen, and most importantly doing all this in the context of a learning loop that helps the organization improve from one project to the next.
Fortunately, the machine learning community hasnt had to bootstrap the knowledge required to accomplish all this from scratch. Practitioners of what has come to be called MLOps have had the benefit of a vast array of knowledge that was developed through the practice of DevOps for traditional software projects.
The first wave of MLOps focused on the application of technology and process discipline to the development and deployment of models, resulting in a greater ability for organizations to move models from the lab to the factory, as well as an explosion of tools and platforms for supporting those stages of the ML lifecycle.
But what about the ops in MLOps? Here again we stand to benefit from progress made operating traditional software systems. A significant contributor to maturing the operational side of DevOps was that communitys broader awareness and application of site reliability engineering (SRE), a set of principles and practices developed at Google and many other organizations that sought to apply engineering discipline to the challenges of operating large-scale, mission-critical software systems.
The application of methodologies from software engineering to machine learning is not a simple lift and shift, however. While one has much to learn from the other, the concerns, challenges, and solutions can differ quite significantly in practice. That is where this book comes in. Rather than leaving it to each individual or team to identify how to apply SRE principles to their machine learning workflow, the authors of this book aim to give you a head start by sharing what has worked for them at Google, Apple, Microsoft, and other organizations.
To say that the authors are well qualified for their task is an understatement. My work has been deeply informed and influenced by several of them over the years.
In the fall of 2019, I organized the first TWIMLcon: AI Platforms conference to provide a venue for the then-nascent MLOps community to share experiences and advance the practice of building processes, tooling, and platforms for supporting the end-to-end machine learning workflow. Among us insiders it became a bit of a running joke just how many of the presentations at the event included a rendition of the real-world ML systems diagram from D. Sculleys seminal paper, Hidden Technical Debt in Machine Learning Systems.