MEAP VERSION 7
Welcome
Thank you for purchasing the MEAP version of Bayesian Optimization in Action!
While an incredibly interesting topic (albeit from my biased perspective), Bayesian optimization may appear elusive to many machine learning practitioners. The main culprit is the accessibility and quality of available resources: a full, in-depth treatment of the topic can only be found in textbooks and research papers, while online tutorials and blog posts lack the depth, structure, and consistency necessary for a solid understanding. I remember when I first learned about Bayesian optimization, it took me a long time to synthesize what I had read from these various sources and understand what they meant as a whole.
This book sets out to address this problem for newcomers whod like to learn about Bayesian optimizationafter all, every book is what the author wishes had existed when they first learned a topic. In the text, you will find a practical guide to what Bayesian optimization is, how it facilitates decision-making under uncertainty to optimize expensive processes, what different variations of Bayesian optimization there are, and last but not least, how to implement them with code. To be able to hit the ground running immediately, you will need a firm grasp on core concepts in machine learning, statistics, and Bayesian probability, as well as experience in Python programming.
Presented in accessible language, each chapter tells the story of each component of the Bayesian optimization framework. These components range from building a machine learning model, equipping the model with assumptions and prior knowledge, to using that model to make effective decisions. The book as a whole makes up a comprehensive narrative that starts you off from the ground up and introduces you to state-of-the-art techniques. Aiming to be a practical and hands-on resource, the text includes many code examples and exercises that will help solidify theoretical discussions with concrete use cases.
As you will see, one of the recurring themes of this book is that effective decision-making relies on intelligently collecting information relevant to our goal. The goal of this platform is to make the book better, and I rely on you, the readers, to provide the relevant information, that is, your feedback, to steer the book in the right direction. I look forward to seeing your comments, questions, and suggestions in the !
Quan Nguyen
1 Introduction to Bayesian optimization
This chapter covers
- What motivates Bayesian optimization and how it works
- Real-life examples of Bayesian optimization problems
- A toy example of Bayesian optimization in action
Im very happy that you are reading this book and excited for your upcoming journey. On a high level, Bayesian optimization is an optimization technique that may be applied when the function (or in general any process that gives you an output when an input is passed in) we are trying to optimize is a black box and expensive to evaluate in terms of time, money, or other resources. This setup encompasses many important tasks including hyperparameter tuning (which we will define shortly). Using Bayesian optimization could accelerate this search procedure and help us locate the optimum of the function as quickly as possible.
As a machine learning practitioner, you might have heard of the term Bayesian optimization from time to time, or you might never encounter it before. While Bayesian optimization has enjoyed enduring interest from the machine learning (ML) research community, its not as commonly used and talked about as other ML topics in practice. Why? Some might say Bayesian optimization has a steep learning curve: you need to understand calculus, use some probability, and overall be an experienced ML researcher to use Bayesian optimization in your application. Our goal for this book is to dispel the message that Bayesian optimization is difficult to use, and show that the technology is more intuitive and accessible than one would think.
Throughout this book, we will see a lot of illustrations, plots, and of course, code, which will help make whichever the topic currently being discussed more straightforward and concrete. You will learn how each component of Bayesian optimization works on a high level and how to implement them using state-of-the-art libraries in Python. Another hope of mine for the accompanying code is that it would help you hit the ground running with your own projects, as the Bayesian optimization framework is very general and "plug-and-play." The exercises are also helpful in this regard.
Generally, I hope this book will be useful to your machine learning needs and overall a fun read. Before we dive into the actual content, lets take some time to motivate the problem that Bayesian optimization sets out to solve.
1.1 Finding the optimum of an expensive, black-box function is a difficult problem
As mentioned above, hyperparameter tuning in ML is one of the most common applications of Bayesian optimization. We will explore this problem, as well as a couple of others, in this section as an example of the general problem of black-box optimization. This will help us understand why Bayesian optimization is needed.
1.1.1 Hyperparameter tuning as an example of an expensive black-box optimization problem
Say you want to train a neural network on a large data set, but you are not sure how many layers this neural net should have. You know that the architecture of a neural net is a make-or-break factor in deep learning, so you perform some initial testing and obtain the results shown in .
Table 1.1. An example of a hyperparameter tuning task. Our task is to decide how many layers the neural network should have in the next trial in the search for the highest accuracy. Its difficult to decide which number we should try next.
Number of layers | Accuracy on the test set |
---|
| 0.72 |
| 0.81 |
| 0.75 |
The best accuracy you have found, 81%, is good, but you think you can do better with a different number of layers. Unfortunately, your boss has set a deadline for you to finish implementing the model. And since training a neural net on your large data set takes several days, you only have a few trials remaining before you have to decide how many layers your network should have. With that in mind, you want to know what other values you should try so that you can find the number of layers giving the highest possible accuracy.
This task is typically called hyperparameter tuning in ML, where you want to find the best setting (hyperparameter values) for your model so as to optimize some performance metric such as predictive accuracy. In our example, the hyperparameter of our neural net is its depth (the number of layers). If you are working with a decision tree, common hyperparameters are the maximum depth, the minimum number of points per node, and the split criterion. With a support-vector machine, you could tune the regularization term and the kernel. Since the performance of a model very much depends on its hyperparameters, hyperparameter tuning is an important component of any ML pipeline.