Deep Learning from Scratch
by Seth Weidman
Copyright 2019 Seth Weidman. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Development Editor: Melissa Potter
- Acquisitions Editors: Jon Hassell and Mike Loukides
- Production Editor: Katherine Tozer
- Copyeditor: Arthur Johnson
- Proofreader: Rachel Monaghan
- Indexer: Judith McConville
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Rebecca Demarest
- September 2019: First Edition
Revision History for the First Edition
- 2019-09-06: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492041412 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Deep Learning from Scratch, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the author, and do not represent the publishers views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-492-04141-2
[LSI]
Preface
If youve tried to learn about neural networks and deep learning, youve probably encountered an abundance of resources, from blog posts to MOOCs (massive open online courses, such as those offered on Coursera and Udacity) of varying quality and even some booksI know I did when I started exploring the subject a few years ago. However, if youre reading this preface, its likely that each explanation of neural networks that youve come across is lacking in some way. I found the same thing when I started learning: the various explanations were like blind men describing different parts of an elephant, but none describing the whole thing. That is what led me to write this book.
These existing resources on neural networks mostly fall into two categories. Some are conceptual and mathematical, containing both the drawings one typically finds in explanations of neural networks, of circles connected by lines with arrows on the ends, as well as extensive mathematical explanations of what is going on so you can understand the theory. A prototypical example of this is the very good book Deep Learning by Ian Goodfellow et al. (MIT Press).
Other resources have dense blocks of code that, if run, appear to show a loss value decreasing over time and thus a neural network learning. For instance, the following example from the PyTorch documentation does indeed define and train a simple neural network on randomly generated data:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N
,
D_in
,
H
,
D_out
=
64
,
1000
,
100
,
10
# Create random input and output data
x
=
torch
.
randn
(
N
,
D_in
,
device
=
device
,
dtype
=
dtype
)
y
=
torch
.
randn
(
N
,
D_out
,
device
=
device
,
dtype
=
dtype
)
# Randomly initialize weights
w1
=
torch
.
randn
(
D_in
,
H
,
device
=
device
,
dtype
=
dtype
)
w2
=
torch
.
randn
(
H
,
D_out
,
device
=
device
,
dtype
=
dtype
)
learning_rate
=
1e-6
for
t
in
range
(
500
):
# Forward pass: compute predicted y
h
=
x
.
mm
(
w1
)
h_relu
=
h
.
clamp
(
min
=
0
)
y_pred
=
h_relu
.
mm
(
w2
)
# Compute and print loss
loss
=
(
y_pred
-
y
)
.
pow
(
2
)
.
sum
()
.
item
()
print
(
t
,
loss
)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred
=
2.0
*
(
y_pred
-
y
)
grad_w2
=
h_relu
.
t
()
.
mm
(
grad_y_pred
)
grad_h_relu
=
grad_y_pred
.
mm
(
w2
.
t
())
grad_h
=
grad_h_relu
.
clone
()
grad_h
[
h
<
0
]
=
0
grad_w1
=
x
.
t
()
.
mm
(
grad_h
)
# Update weights using gradient descent
w1
-=
learning_rate
*
grad_w1
w2
-=
learning_rate
*
grad_w2
Explanations like this, of course, dont give much insight into what is really going on: the underlying mathematical principles, the individual neural network components contained here and how they work together, and so on.
What would a good explanation of neural networks contain? For an answer, it is instructive to look at how other computer science concepts are explained: if you want to learn about sorting algorithms, for example, there are textbooks that will contain: