inside front cover
inside front cover
Machine Learning Engineering in Action
BEN WILSON
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
www.manning.com
Copyright
For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
2022 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Mannings policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
| Manning Publications Co. 20 Baldwin Road Technical PO Box 761 Shelter Island, NY 11964 |
Development editor: | Patrick Barb |
Technical development editor: | Marc-Philippe Huget |
Review editor: | Aleksandar Dragosavljevi |
Production editor: | Keri Hales |
Copy editor: | Sharon Wilkey |
Proofreader: | Melody Dolab |
Technical proofreader: | Ninoslav erkez |
Typesetter: | Dennis Dalinnik |
Cover designer: | Marija Tudor |
ISBN: 9781617298714
front matter
preface
Even as a young boy, I was stubborn. When people would suggest simple ways of doing things, I would ignore advice, choosing to always do things the hard way. Decades later, not much changed as I shifted through increasingly challenging careers, eventually landing in the realm of data science (DS) and machine learning (ML) engineering, and now ML software development. As a data scientist in industry, I always felt the need to build overly complex solutions, working in isolation to solve a given problem in the way that I felt was best.
I had some successes but many failures, and generally left a trail of unmaintainable code in my wake as I moved from job to job. Its not something that Im particularly proud of. Ive been contacted by former colleagues, years after leaving a position, to have them tell me that my code is still running every day. When Ive asked each one of them why, Ive gotten the same demoralizing answer that has made me regret my implementations: No one can figure it out to make changes to it, and its too important to turn off.
Ive been a bad data scientist. Ive been an even worse ML engineer. It took me years to learn why that is. That stubbornness and resistance to solving problems in the simplest way created a lot of headaches for others, both in the sheer number of cancelled projects while I was at companies and in the unmaintainable technical debt that I left in my wake.
It wasnt until my most recent job, working as a resident solutions architect at Databricks (essentially a vendor field consultant), that I started to learn where I had gone wrong and to change how I approached solving problems. Likely because I was now working as an advisor to help others who were struggling with data science problems, I was able to see my own shortcomings through the abstract reflection of what they were struggling through. Over the past few years, Ive helped quite a few teams avoid many pitfalls that Ive experienced (and created through my own stubbornness and hubris). I figured that writing down some of this advice that I give people regularly could benefit a broader audience, beyond my individual conversations with isolated teams in the context of my job.
After all, applying machine learning to a real-world use case is hard enough when following along with examples and books on the concepts of applied ML. When you introduce the staggering complexity of end-to-end project work (which is the focus of this book), it comes as little surprise that many companies fail to realize the potential of ML in their businesses. Its just hard. Its easier if you have a guide, though.
This book doesnt aim to be a guide to applied ML. Were not going to be covering algorithms or theories on why one model is better than another for a particular use case, nor will we delve into all the details to solve individual problems. Rather, this book is a guide to avoid the pitfalls that Ive seen so many teams fall into (and ones that Ive had to claw my way out of as a practitioner). It is a generalized approach to using DS techniques to solve problems in a way that you, your customers (the internal ones at your company), and your peers will not regret. Its a guide to help you avoid making some of the really stupid mistakes that Ive made.
In the words of two of my relatively recently acquired favorite proverbs:
Ask the experienced rather than the learned.
Arab proverb
It is best to learn wisdom by the experience of others.
Latin proverb
acknowledgments
Theres absolutely no way that this book would have been possible without the support of my truly staggeringly amazing wife, Julie. Shes had to endure countless evenings of me toiling away in my office well past midnight, hammering away at drafts, edits, and code refactoring. Im not sure if youll ever get the chance to meet her, but shes truly incredible. Not only is she my soulmate, but shes one of the few people on this planet capable of making me genuinely laugh and is a constant inspiration to me. I could argue that most of the wisdom that Ive learned about how to influence and interact with people in a positive manner comes directly from me observing her.
Id like to thank Patrick Barb, my development editor at Manning, for this book. Hes been invaluable in getting this into the state that its in, consistently challenged me to reduce my verbosity, and has been a great resource for helping me distill the points Ive tried to make throughout the book. Along with Brian Sawyer, my acquisitions editor, and Marc-Philippe Huget, my technical development editor, the three of them have been an immense help throughout this entire process. In addition, a sincere thank you to Sharon Wilkey, the copy editor for this book, for incredible insight and fantastic skill in making the tone and flow of the book much better, and to all of the Manning team for their hard work in producing this book.