Contents
18.2 Stochastic Maximum Likelihood and Contrastive Divergence... 609
Website
www.deeplearningbook.org
This book is accompanied by the above website. The website provides a variety of supplementary material, including exercises, lecture slides, corrections of mistakes, and other resources that should be useful to both readers and instructors.
Acknowledgments
This book would not have been possible without the contributions of many people.
We would like to thank those who commented on our proposal for the book and helped plan its contents and organization: Guillaume Alain, Kyunghyun Cho, Qaglar Gulgehre, David Krueger, Hugo Larochelle, Razvan Pascanu and Thomas Rohee.
We would like to thank the people who offered feedback on the content of the book itself. Some offered feedback on many chapters: Martm Abadi, Guillaume Alain, Ion Androutsopoulos, Fred Bertsch, Olexa Bilaniuk, Ufuk Can Bigici, Matko Bosnjak, John Boersma, Greg Brockman, Alexandre de Brebisson, Pierre Luc Carrier, Sarath Chandar, Pawel Chilinski, Mark Daoust, Oleg Dashevskii, Laurent Dinh, Stephan Dreseitl, Jim Fan, Miao Fan, Meire Fortunato, Frederic Francis, Nando de Freitas, Qaglar Gulgehre, Jurgen Van Gael, Javier Alonso Garaa, Jonathan Hunt, Gopi Jeyaram, Chingiz Kabytayev, Lukasz Kaiser, Varun Kanade, Akiel Khan, John King, Diederik P. Kingma, Yann LeCun, Rudolf Mathey, Matias Mattamala, Abhinav Maurya, Kevin Murphy, Oleg Murk, Roman Novak, Augustus Q. Odena, Simon Pavlik, Karl Pichotta, Kari Pulli, Roussel Rahman, Tapani Raiko, Anurag Ranjan, Johannes Roith, Mihaela Rosca, Halis Sak, Cesar Salgado, Grigory Sapunov, Yoshinori Sasaki, Mike Schuster, Julian Serban, Nir Shabat, Ken Shirriff, Andre Simpelo, Scott Stanley, David Sussillo, Ilya Sutskever, Carles Gelada Saez, Graham Taylor, Valentin Tolmer, An Tran, Shubhendu Trivedi, Alexey Umnov, Vincent Vanhoucke, Marco Visentini-Scarzanella, David Warde-Farley, Dustin Webb, Kelvin Xu, Wei Xue, Ke Yang, Li Yao, Zygmunt Zaj^c and Ozan Qaglayan.
We would also like to thank those who provided us with useful feedback on individual chapters:
Notation: Zhang Yuanhang.
Chapter 1, Introduction: Yusuf Akgul, Sebastien Bratieres, Samira Ebrahimi, Charlie Gorichanaz, Brendan Loudermilk, Eric Morris, Cosmin Parvulescu and Alfredo Solano.
Chapter 2, Linear Algebra: Amjad Almahairi, Nikola Banic, Kevin Bennett, Philippe Castonguay, Oscar Chang, Eric Fosler-Lussier, Andrey Khalyavin, Sergey Oreshkov, Istvan Petras, Dennis Prangle, Thomas Rohee, Colby Toland, Massimiliano Tomassoli, Alessandro Vitale and Bob Welland.
Chapter 3, Probability and Information Theory: John Philip Anderson, Kai Arulkumaran, Vincent Dumoulin, Rui Fa, Stephan Gouws, Artem Oboturov, Antti Rasmus, Alexey Surkov and Volker Tresp.
Chapter 4, Numerical Computation: Tran Lam An, Ian Fischer, and Hu Yuhuang.
Chapter 5, Machine Learning Basics: Dzmitry Bahdanau, Nikhil Garg, Makoto Otsuka, Bob Pepin, Philip Popien, Emmanuel Rayner, Kee-Bong Song, Zheng Sun and Andy Wu.
Chapter 6, Deep Feedforward Networks: Uriel Berdugo, Fabrizio Bottarel, Elizabeth Burl, Ishan Durugkar, Jeff Hlywa, Jong Wook Kim, David Krueger and Aditya Kumar Praharaj.
Chapter 7, Regularization for Deep Learning: Kshitij Lauria, Inkyu Lee, Sunil Mohan and Joshua Salisbury.
Chapter 8, Optimization for Training Deep Models: Marcel Ackermann, Rowel Atienza, Andrew Brock, Tegan Maharaj, James Martens, Klaus Strobl and Martin Vita.
Chapter 9, Convolutional Networks: Martin Arjovsky, Eugene Brevdo, Konstantin Divilov, Eric Jensen, Asifullah Khan, Mehdi Mirza, Alex Paino, Eddie Pierce, Marjorie Sayer, Ryan Stout and Wentao Wu.
Chapter 10, Sequence Modeling: Recurrent and Recursive Nets: Gokgen Eraslan, Steven Hickson, Razvan Pascanu, Lorenzo von Ritter, Rui Rodrigues, Dmitriy Serdyuk, Dongyu Shi and Kaiyu Yang.
Chapter 11, Practical Methodology: Daniel Beckstein.
Chapter 12, Applications: George Dahl and Ribana Roscher.
Chapter 15, Representation Learning: Kunal Ghosh.
Chapter 16, Structured Probabilistic Models for Deep Learning: Minh Le and Anton Varfolom.
Chapter 18, Confronting the Partition Function: Sam Bowman.
Chapter 19, Approximate Inference: Yujia Bao.
Chapter 20, Deep Generative Models: Nicolas Chapados, Daniel Galvez, Wenming Ma, Fady Medhat, Shakir Mohamed and Gregoire Montavon.
Bibliography: Lukas Michelbacher and Leslie N. Smith.
We also want to thank those who allowed us to reproduce images, figures or data from their publications. We indicate their contributions in the figure captions throughout the text.
We would like to thank Lu Wang for writing pdf2htmlEX, which we used to make the web version of the book, and for offering support to improve the quality of the resulting HTML.
We would like to thank Ians wife Daniela Flori Goodfellow for patiently supporting Ian during the writing of the book as well as for help with proofreading.
We would like to thank the Google Brain team for providing an intellectual environment where Ian could devote a tremendous amount of time to writing this book and receive feedback and guidance from colleagues. We would especially like to thank Ians former manager, Greg Corrado, and his current manager, Samy Bengio, for their support of this project. Finally, we would like to thank Geoffrey Hinton for encouragement when writing was difficult.
Notation
This section provides a concise reference describing the notation used throughout this book. If you are unfamiliar with any of the corresponding mathematical concepts, this notation reference may seem intimidating. However, do not despair, we describe most of these ideas in chapters 2-4. |
Numbers and Arrays |
a | A scalar (integer or real) |
a | A vector |
A | A matrix |
A | A tensor |
In | Identity matrix with n rows and n columns |
I | Identity matrix with dimensionality implied by context |
e(i) | Standard basis vector [0,..., 0, 1,0,..., 0] with a 1 at position i |
diag( a ) | A square, diagonal matrix with diagonal entries given by a |
a | A scalar random variable |
a | A vector-valued random variable |
A | A matrix-valued random variable |
Xi
Sets and Graphs
A | A set |
R | The set of real numbers |
{0,1} | The set containing 0 and 1 |
{0,1,... ,n} | The set of all integers between 0 and n |
[a, b] | The real interval including a and b |
(a, b] | The real interval excluding a but including b |
A\B | Set subtraction, i.e., the set containing the ele ments of A that are not in B |
G | A graph |
Pa g(x i ) | The parents of xi in G |
Indexing
a* Element i of vector a, with indexing starting at 1
a All elements of vector a except for element i Ai7j Element i, j of matrix A
A; Row i of matrix A A;,i Column i of matrix A Ajk Element (i, j, k) of a 3-D tensor A A;; i 2-D slice of a 3-D tensor
a* Element i of the random vector a
Linear Algebra Operations
AJ Transpose of matrix A
A+ Moore-Penrose pseudoinverse of A A 0 B Element-wise (Hadamard) product of A and B det(A) Determinant of A