MATH FOR DEEP LEARNING
What You Need to Know to Understand Neural Networks
by Ronald T. Kneusel
San Francisco
MATH FOR DEEP LEARNING. Copyright 2022 by Ronald T. Kneusel.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.
ISBN-13: 978-1-7185-0190-4 (print)
ISBN-13: 978-1-7185-0191-1 (ebook)
Publisher: William Pollock
Production Manager: Rachel Monaghan
Production Editors: Dapinder Dosanjh and Katrina Taylor
Developmental Editor: Alex Freed
Cover Illustrator: James L. Barry
Cover and Interior Design: Octopod Studios
Technical Reviewer: David Gorodetzky
Copyeditor: Carl Quesnel
Proofreader: Emelie Battaglia
For information on book distributors or translations, please contact No Starch Press, Inc. directly:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 415.863.9900; fax: 415.863.9950;
Library of Congress Control Number: 2021939724
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an As Is basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
In memory of Tom Fitz Fitzpatrick (19442013), the best math teacher I ever had. And to all the math teachers out therethey receive far too little appreciation for all of their hard work.
About the Author
Ron Kneusel has been working with machine learning in industry since 2003 and completed a PhD in machine learning from the University of Colorado, Boulder, in 2016. Ron has three other books: Practical Deep Learning: A Python-Based Introduction (No Starch Press), Numbers and Computers (Springer), and Random Numbers and Computers (Springer).
About the Technical Reviewer
David Gorodetzky is a research scientist who works at the intersection of remote sensing and machine learning. Since 2011 he has led a small research group within a large government-services engineering firm that develops deep learning solutions for a wide variety of problems in remote sensing. David began his career in planetary geology and geophysics, detoured into environmental consulting, then studied paleoclimate reconstruction from polar ice cores in graduate school, before settling into a career in satellite remote sensing. For more than 15 years he was a principal consultant for a software services group developing image analysis and signal processing algorithms for clients across diverse fields, including aerospace, precision agriculture, reconnaissance, biotech, and cosmetics.
BRIEF CONTENTS
CONTENTS IN DETAIL
FOREWORD
Artificial intelligence (AI) is ubiquitous. You need look no further than the device in your pocket for evidenceyour phone now offers facial recognition security, obeys simple voice commands, digitally blurs backgrounds in your selfies, and quietly learns your interests to give you a personalized experience. AI models are being used to analyze mountains of data to efficiently create vaccines, improve robotic manipulation, build autonomous vehicles, harness the power of quantum computing, and even adjust to your proficiency in online chess. Industry is adapting to ensure state-of-the-art AI capabilities can be integrated into its domain expertise, and academia is building curriculum that exposes concepts of artificial intelligence to each degree-based discipline. An age of machine-driven cognitive autonomy is upon us, and while we are all consumers of AI, those expressing an interest in its development need to understand what is responsible for its substantial growth over the past decade. Deep learning, a subcategory of machine learning, leverages very deep neural networks to model complicated systems that have historically posed problems for traditional, analytical methods. A newfound practical use of these deep neural networks is directly responsible for this surge in development of AI, a concept that most would attribute to Alan Turing back in the 1950s. But if deep learning is the engine for AI, what is the engine for deep learning?
Deep learning draws on many important concepts from science, technology, engineering, and math (STEM) fields. Industry recruiters continue to seek a formal definition of its constituents as they try to attract top talent with more descriptive job requisitions. Similarly, academic program coordinators are tasked with developing the curriculum that builds this skill set as it permeates across disciplines. While inherently interdisciplinary in practice, deep learning is built on a foundation of core mathematical principles from probability and statistics, linear algebra, and calculus. The degree to which an individual must embrace and understand these principles depends on the level of intimacy one expects to have with deep learning technologies.
For the implementer, Math for Deep Learning acts as a troubleshooting guide for the inevitable challenges encountered in deep neural network implementation. This individual is typically concerned with efficient implementation of preexisting solutions with tasks including identification and procurement of open source code, setting up a suitable work environment, running any available unit tests, and finally, retraining with relevant data for the application of interest. These deep neural networks may contain tens or hundreds of millions of learnable parameters, and assuming adequate user proficiency, successful optimization relies on sensitive hyperparameter selection and access to training data that sufficiently represents the population. The first (and second, and third) attempt at implementation often requires a daunting journey into neural network interrogation, which requires dissection into and higher-level understanding of the mathematical drivers presented here.
At some point, the implementer usually becomes the integrator. This level of expertise requires some familiarity with the desired application domain and a lower-level understanding of the building blocks that enable deep learning. In addition to the challenges faced in basic implementation, the integrator needs to be able to generalize core concepts to mold a mathematical model to the desired domain. Disaster strikes again! Perhaps the individual experiences the exploding-gradient problem. Maybe the integrator desires a more representative loss function that may pose differentiability issues. Or maybe, during training, the individual recognizes that the selected optimization strategy is ineffective for the problem.