Lucas Pinheiro Cinelli
Program of Electrical Engineering - COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Matheus Arajo Marins
Program of Electrical Engineering - COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Eduardo Antnio Barros da Silva
Program of Electrical Engineering - COPPE / Department of Electronics - Poli, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Ssrgio Lima Netto
Program of Electrical Engineering - COPPE / Department of Electronics - Poli, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
ISBN 978-3-030-70678-4 e-ISBN 978-3-030-70679-1
https://doi.org/10.1007/978-3-030-70679-1
Springer Nature Switzerland AG 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This book has its origins in the first authors profound interest in uncertainty as a natural way of thinking. Indeed, behavioral studies support that humans perform nearly optimal Bayesian inference, efficiently integrating multi-sensory information while being energetically efficient. At the same time, modern deep learning methods still are sensitive to overfitting and lack uncertainty estimation even though they achieve human-level results in many tasks. The Bayesian framework fits elegantly as a manner to tackle both issues, simultaneously offering a principled mathematical ground.
While Bayesian ML and approximate inference are rather broad topics, each spawning entire books, the present text is a self-contained introduction to modern variational methods for Bayesian Neural Network (BNN). Even within this realm, research is fortunately sprouting at a rate difficult to follow and many algorithms are also being reinterpreted through Bayesian lenses. We focus on practical BNN algorithms that are either relatively easy to understand or fast to train. We also address one specific usage of a variational technique for generative modeling.
The target audience are those already familiar with ML and modern NN. Although basic knowledge of calculus, linear algebra, and probability theory is a must to comprehend the concepts and derivations herein, they should also be enough. We explicitly avoid matrix calculus since the material may be challenging by itself, and adding this difficulty does not really aid in understanding the book and may actually intimidate the reader. Furthermore, we do not assume the reader to be familiar with statistical inference and thus explain the necessary information throughout the text.
Most introductory texts cover either modern NNs or general Bayesian methods for ML, with little work dedicated to both simultaneously to this date. Information is scattered around in research blog posts and introductions of published papers, with the sole in-depth work being Neals excellent Ph.D. thesis from 1996, which does not cover modern variational approximations. The current scenario makes the leap from NNs to BNNs hard from a theoretical point of view: the reader needs either to learn Bayesian methods first or to decide what matters and which algorithms to learn, the former being cumbersome and the latter troublesome in a self-study scenario.
The present book has the mission of filling this gap and helping others cross from one area to the other with not only a working knowledge but also an understanding of the theoretical underpinnings of the Bayesian approach.
Prior to any trending ML technique, we introduce in Chap. the required statistical tools that many students lack nowadays. We discuss what is a model, how information is measured, what is the Bayesian approach, as well as two cornerstones of statistical inference: estimation and hypothesis testing. Even those already familiar with the subject could benefit from the refresher, at the same time acclimating with the notation.
In Chap. , we introduce the building blocks of Model-Based Machine Learning (MBML). We explain what it is and discuss its main enabling techniques: Bayesian inference, graphical models, and, more recently, probabilistic programming. We explain approximate inference and broach deterministic distributional approximation methods, focusing on Variational Bayes, Assumed Density Filtering, and Expectation Propagation, going through derivations, advantages, issues, and modern extensions.
In Chap. , we introduce the concept and advantages of the Bayesian Neural Network (BNN). We scrutinize four of the most popular algorithms in the area: Bayes by Backpropagation, Probabilistic Backpropagation, Monte Carlo Dropout, and Variational Adam, covering their derivations, benefits, and issues. We finish by comparing the algorithms through a 1-D example as well as more complex scenarios.
In Chap. , we introduce generative models. We focus specifically on the Variational Autoencoder (VAE) family, a well-known deep generative model. The ability to model the process that generates the observed data empowers us to simulate new data, create world models, grasp underlying generative factors, and learn with little to no supervision. Starting with a simple example, we build the vanilla VAE, pointing out its shortcomings and various extensions, such as the Conditional VAE, the -VAE, the Categorical VAE, and others. We end the chapter with numerous VAE experiments on two image data sets, and an illustrative example of semi-supervised learning with VAEs.
We take this opportunity to thank our professors and colleagues who helped us in writing this book. In particular, we thank Dr. Leonardo Nunes and Professor Lus Alfredo de Carvalho who first came up with its conceptual ideal. We also thank our loved ones for putting up with us during the challenging and interesting times of turning the book into a reality.