Practical Machine Learning for Computer Vision
by Valliappa Lakshmanan , Martin Grner , and Ryan Gillard
Copyright 2021 Valliappa Lakshmanan, Martin Grner, and Ryan Gillard. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Acquisition Editor: Rebecca Novack
- Development Editor: Amelia Blevins and Shira Evans
- Production Editor: Katherine Tozer
- Copyeditor: Rachel Head
- Proofreader: Piper Editorial Consulting, LLC
- Indexer: Ellen Troutman-Zaig
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Robert Romano
Revision History for the First Edition
- 2021-07-21: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098102364 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Practical Machine Learning for Computer Vision, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-098-10236-4
[LSI]
Preface
Machine learning on images is revolutionizing healthcare, manufacturing, retail, and many other sectors. Many previously difficult problems can now be solved by training machine learning (ML) models to identify objects in images. Our aim in this book is to provide intuitive explanations of the ML architectures that underpin this fast-advancing field, and to provide practical code to employ these ML models to solve problems involving classification, measurement, detection, segmentation, representation, generation, counting, and more.
Image classification is the hello world of deep learning. Therefore, this book also provides a practical end-to-end introduction to deep learning. It can serve as a stepping stone to other deep learning domains, such as natural language processing.
You will learn how to design ML architectures for computer vision tasks and carry out model training using popular, well-tested prebuilt models written in TensorFlow and Keras. You will also learn techniques to improve accuracy and explainability. Finally, this book will teach you how to design, implement, and tune end-to-end ML pipelines for image understanding tasks.
Who Is This Book For?
The primary audience for this book is software developers who want to do machine learning on images. It is meant for developers who will use TensorFlow and Keras to solve common computer vision use cases.
The methods discussed in the book are accompanied by code samples available at https://github.com/GoogleCloudPlatform/practical-ml-vision-book. Most of this book involves open source TensorFlow and Keras and will work regardless of whether you run the code on premises, in Google Cloud, or in some other cloud.
Developers who wish to use PyTorch will find the textual explanations useful, but will probably have to look elsewhere for practical code snippets. We do welcome contributions of PyTorch equivalents of our code samples; please make a pull request to our GitHub repository.
How to Use This Book
We recommend that you read this book in order. Make sure to read, understand, and run the accompanying notebooks in the books GitHub repositoryyou can run them in either Google Colab or Google Clouds Vertex Notebooks. We suggest that after reading each section of the text you try out the code to be sure you fully understand the concepts and techniques that are introduced. We strongly recommend completing the notebooks in each chapter before moving on to the next chapter.
Google Colab is free and will suffice to run most of the notebooks in this book; Vertex Notebooks is more powerful and so will help you run through the notebooks faster. The more complex models and larger datasets of Chapters will benefit from the use of Google Cloud TPUs. Because all the code in this book is written using open source APIs, the code should also work in any other Jupyter environment where you have the latest version of TensorFlow installed, whether its your laptop, or Amazon Web Services (AWS) Sagemaker, or Azure ML. However, we havent tested it in those environments. If you find that you have to make any changes to get the code to work in some other environment, please do submit a pull request in order to help other readers.
The code in this book is made available to you under an Apache open source license. It is meant primarily as a teaching tool, but can serve as a starting point for your production models.
Organization of the Book
The remainder of this book is organized as follows:
In are generic and thus dont work particularly well on images, but the concepts introduced in this chapter are essential for the rest of the book.
In , we introduce some machine learning models that do work well on images. We start with transfer learning and fine-tuning, and then introduce a variety of convolutional models that increase in sophistication as we get further and further into the chapter.
In .
In Chapters .
In into an end-to-end, containerized ML pipeline, then we try out a no-code image classification system that can serve for quick prototyping and as a benchmark for more custom models. Finally, we show how to build explainability into image model predictions.
In Chapters , we demonstrate how the basic building blocks of computer vision are used to solve a variety of problems, including image generation, counting, pose detection, and more. Implementations are provided for these advanced use cases as well.
Conventions Used in This Book
The following typographical conventions are used in this book:
ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, data types, environment variables, statements, and keywords.
Constant width bold
Used for emphasis in code snippets, and to show command or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.