Fundamentals of Machine Learning

Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the Universitys objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
Oxford University Press 2020
The moral rights of the author have been asserted
First Edition published in 2020
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2019945424
ISBN 9780198828044
ebook ISBN 9780192563095
DOI: 10.1093/oso/9780198828044.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Acknowledgements
The material in this book has been inspired by several great resources. I would specifically like to acknowledge the influence of Andrew Ngs lecture notes for several parts in this book, especially some basic Bayesian formulations and examples. The excellent book of Francois Cholett on deep learning with Python is very much recommended for both the wonderful explanation of deep learning techniques and the implementation in Keras. We follow much of the presentation of automatic differentiation by the review of Baydin, Pearlmutter, Radul, and Siskind (JMLR 2018) which we recommend for further insights.
Many colleagues have contributed considerably to this book. In particular, I would thank Paul Hollensen, Patrick Connor, and Hossein Parvar for a lot of help with some of deep reinforcement learning have shaped much of the corresponding chapter. Finally, a very special thanks to all my students who took my classes over the last several years and have challenged me to think deeper about machine learning, and to investigate the roots of assumptions that we make.
Preface
Machine learning is exploding, both in research and industrial applications. Although much of the machine learning ideas have been around for many years, the latest breakthroughs are based on several advances. One is the availability of large datasets with labeled data. Another is the availability of fast specialized processors such as graphics processing units (GPUs). In addition, progress is fueled by a deeper understanding of building models and learning from data, as well as some new techniques that brought everything together.
There are now a variety of wonderful books and online resources available on machine learning. So why another book? There are several reasons why I felt compelled to offer my contribution here. Many recent books focus on specific aspects of machine learning, in particular deep learning on the one hand and Bayesian methods on the other. In this book I try to develop a bridge or mutual understanding of what often seems to be viewed as two opposite ends of machine learning. I would like to argue that both approaches are important, have specific strengths in specific application areas, and that a combined view of machine learning and scientific modeling is useful. While this book places some focus on general machine learning methods, I believe that the insight and rigor of probabilistic modeling approaches aid to the general understanding, which in turn offers help in applying machine learning techniques more efficiently.
Another reason that I hope this book is appreciated is that I like to keep explanations brief while still providing some ideas about the deeper reasoning about the methods. It is important to keep this style of the book in mind as treatments and examples are deliberately minimal by design. Also, most explanations are deliberately brief in contrast to more traditional teaching books. My hope is to motivate and guide the reader sufficiently enough to consult further resources for advanced studies. I find this particularly important in an age where there are wonderful resources available on the Internet. I do not claim to cover all details of machine learning, but my hope is to provide the fundamentals for a good understanding that can help to guide further studies.
This book tries to strive a balance between the rigor of mathematical arguments and general outlining principle ideas. In this book, I use mathematical notation mainly as descriptors to keep presentations brief and to show the general form of some equations. For the most part, this book does not include rigorous mathematical proofs or derivations, but I hope to give enough details to see how results can be derived. I know that some readers might tend to avoid mathematical notations, but I would like to encourage these individuals to see them as providing a short form of a story. by contrast, other readers might find my simplifications debatable in a strict mathematical context. However, I think mathematical tools are useful at the level intended here to communicate ideas.
This book includes a brief overview of some older machine learning techniques such as support vector machines and decision trees. While these approaches might be considered shallow or old-fashioned with respect to deep learning, they have still important practical applications as they might provide solutions to applications that do not require the increased complexity of deep models. We will not dwell for very long into the theory of these traditional methods even though some of the stated formulas seem complex. However, I hope that mentioning some of these ideas, such as kernel methods or Lagrange methods for optimizations with constraints, will add to the foundation of studying more theoretical aspects that are often assumed in modern research papers. This is particularly the case for support vector machines for which there is a rich theory.
Since I have a personal interest in how the brain works, I did include some comments on the relations of machine learning and the brain. The brain is often quoted as inspiration for machine learning methods like neural networks. On the other hand, machine learning is also inspirational for neuroscience by giving us some ideas of possible information processing principles that could be at work in the brain, or at highlighting differences.
In the first chapter we will tour the main ideas of machine show how to use sklearn and Keras to implement some of the methods.
The second part of the book comprising the following four chapters is intended to take a deeper look into the foundations of machine learning and scientific modeling in general. This includes a formalization of regression and gradient descent optimization, and discussions of the probabilistic aspects in modeling. The final section of the book comprises the last three chapters which are dedicated to three important and hopefully interesting advanced aspects of machine learning. The first is recurrent neural networks, which capture temporal aspects in modeling; the second is reinforcement learning, which captures learning of agents and is hence a much more general setting of learning machines; and the last chapter consists of some brief thoughts on the impact of machine learning on our society.