2023 Daniel P. Friedman and Anurag Mendhekar
All rights reserved. With the exception of completed code examples in solid boxes, no part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. The completed code examples that are presented in solid boxes are licensed under a Creative Commons Attribution 4.0 International License (CC-By 4.0).
This book was set in Computer Modern Unicode by the authors using .
Library of Congress Cataloging-in-Publication Data is available.
ISBN 978-0-262-54637-9
10987654321
d_r0
To Mary, from the title of the song this is Dedicated to the One I Love.
What a fabulous ride you have taken me on!
With all my love and admiration.
Danny
To the shining stars in my life, Aruna, Rishma, and Aria Nina, and my constant 4-legged companions Chikki and Heera.
Without you there would only be darkness.
Anurag
We ran into each other on Friday, the 13th of April, 2018, at the over-crowded, official opening of the Luddy School of Informatics, Computing, and Engineering and we decided to write a book on machine learning based on this very deep conversation directly following the close of the event.
Anurag: I want to write a little book with you.
Dan: Let's do it!
a few seconds later
Dan: What's the topic?
Anurag: Machine learning
Dan: Now, that will be a worthy challenge!
And the rest of the time we reminisced
Contents
Foreword
by Guy L. Steele Jr.
This book is exactly right.
Dan Friedman, with his able and expert co-authors, has been writing books in his unique Little style for over four decades. Every one is a gem, explaining deep and important ideas in computer science in bite-sized chunks. Dan and his co-authors have raised the programmed learning question-and-answer format to an art form, to a conversational style that seems almost breezy. This very volume introduces two innovations, nuggets and revision charts, that further streamline the presentation of chunks of program code and their behavior.
Regarding the fundamental ideas behind machine learning
This book presents exactly the right ideas
in exactly the right format
and
in exactly the right order
All you need to do is read the book in order (don't skip ahead!)
The authors themselves remark:
Little books are all about packaging ideas neatly into little boxes.
What ideas are in this book? The mathematics and computational techniques of machine learning, of course: you'll learn about successive approximation, stochastic gradient descent, neural networks, and automatic back-propagation. However, as a programming languages guy, I am also interested in how the authors use language to frame the mathematics. To me, a big overarching idea here is how the authors use higher-order functionsthat is, functions that take other functions as arguments and/or return other functions as resultsto explain the data structures and computations of machine learning.
The fundamental data structure is the tensor, which at first glance looks like an ordinary array or matrix; but the authors explain that tensor operations have the additional property of automatically using a higher-order mapping function when appropriate, and thisalmost magically, it seems to meenables a function apparently written for scalars (single numbers), or a tensor of a specific dimension, to be applied generally to all kinds and sizes of arrays, vectors, and matrices with no additional effort.
Another application of higher-order functions is currying, which allows you to give a function some of its arguments now and others laterwhen you give it some of its arguments now, it returns another function that can be applied to the other arguments later to get the final answer. The presentation in this book uses currying in a clearand, to me, pleasantly surprisingway to explain the difference between arguments and parameters in machine learning, and why they need to be presented in a specific order, some now and some later. (A third sort of argument, hyperparameters, is also explained using yet another programming-language mechanism. If you're familiar with the buzzphrase dynamic scoping you are in for a treat; if you are not, no worrieshyperparameters and their behavior are clearly explained by example.)
The third use of higher-order functions in this book is to structure the composition of large neural networks from smaller building blocks, and to explain the behavior and training of these networks.
A fourth use of higher-order functions is to provide for data abstraction. At first, parameters are always simple numbers, but the code for processing them is so deftly defined using just a few interface functionsexactly the right onesthat the higher-order code does not need to be changed when the representation of parameters is extended. (A key idea is projection: provide two functions, one that projects data into an alternate representation that is easier to compute on, and another that pulls a computed result back into the original representation. Then a function that accepts two such functions as arguments can be used in a very general way.) Similarly, scalars are simple numbers throughout most of the book, but when it becomes necessary to generalize them to duals in appendix A, higher-order functions make the task simple.
So if you are interested in big-picture programming-language ideas
Keep these applications of higher-order functions in mind as you read
You may enjoy spotting them as they go by
but if you don't care about higher-order functions
Please ignore everything I have just said
Immerse yourself in the story of machine learning
This book needs no introduction; it is exactly right for its purpose.
I have read all the books in the Little series; each time I have said to myself, This is the best one ever! This time, Dan and Anurag have done it againthis is the best. It stands on its own; you don't need to have read earlier Little books to understand this one, and you don't need to understand Scheme or any other programming language ahead of time. The dozen or so programming-language ideas you need are explained along the way, each exactly when you need it, and with plenty of examples. Give it time and enjoy the journey.
Guy L. Steele Jr.
Lexington, Massachusetts
August 2022
Foreword
by Peter Norvig
Hi, I'm Peter Norvig, a long-time researcher and practitioner of machine learning. I've had the pleasure of reading this book, and was asked to make some comments on it. I'm going to do that in the form of a dialog with my esteemed colleague, Typical Reader. Welcome to this foreword Mx. Reader, how are you? | | Thanks, I'm happy to be here, even if I am imaginary. And please, call me Tipi. |
Okay, Tipi. I can say that I thoroughly enjoyed the book and appreciated the way it carefully developed the key concepts. How did you find the book? | | To be honest, I haven't read it all yet. So far I've only skimmed it. It looks interesting, but I'm trying to decide if it is worth the time and effort to work through the whole book. |