Contents
List of Figures
List of Tables
Guide
Pagebreaks of the print version
Adaptive Computation and Machine Learning
Francis Bach, Editor
Christopher Bishop, David Heckerman, Michael Jordan, and Michael Kearns, Associate Editors
A complete list of books published in the Adaptive Computation and Machine Learning series appears at the back of this book.
Introduction to Online Convex Optimization
Second Edition
Elad Hazan
The MIT Press
Cambridge, Massachusetts
London, England
2022 Elad Hazan
This work is subject to a Creative Commons CC-BY-ND-NC license.
Subject to such license, all rights are reserved.
The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.
Library of Congress Cataloging-in-Publication Data
Names: Hazan, Elad, 1975 author.
Title: Introduction to online convex optimization / Elad Hazan.
Description: Second edition. | Cambridge: The MIT Press, 2022. | Series: Adaptive computation and machine learning series | Includes bibliographical references and index.
Identifiers: LCCN 2021050689 | ISBN 9780262046985 (hardcover)
Subjects: LCSH: Mathematical optimization. | Convex domains.
Classification: LCC QA402.5.H297 2022 | DDC 519.7/6dc23/eng/20211130
LC record available at https://lccn.loc.gov/2021050689
d_r0
To my family: Dana, Hadar, Yoav, Oded, and Deluca
you shall research and meditate therein day and night
Yehoshua 1:8
Contents
- List of Figures
- List of Tables
Preface
This book serves as an introduction to the expanding theory of online convex optimization (OCO). It was written as an advanced textbook to serve as a basis for a graduate course, and/or as a reference to researchers diving into this fascinating world at the intersection of optimization and machine learning.
Such a course was given at the Technion in 20102014, with slight variations from year to year, and later at Princeton University in 20152020. The core material in these courses is fully covered in this book, along with exercises that allow students to complete parts of proofs, or that were found illuminating and thought-provoking by those taking the course. Most of the material is given with examples of applications, which are interlaced throughout various topics. These include prediction from expert advice, portfolio selection, matrix completion and recommendation systems, and support vector machine (SVM) training.
Our hope is that this compendium of material and exercises will be useful to you: the researcher and/or educator.
Placing This Book in the Machine Learning Library
The broad field of machine learning, as in the subdisciplines of online learning, boosting, regret minimization in games, universal prediction, and other related topics, have seen a plethora of introductory textbooks in recent years. With this note, we can hardly do justice to all of these, but perhaps point to the most related books on the topics of machine learning, learning in games, and optimization, whose intersection is our main focus.
The most closely related book, which served as an inspiration to the current book, and indeed an inspiration to the entire field of learning in games, is the wonderful text of Cesa-Bianchi and Lugosi (2006). From the literature on mathematical optimization theory, there are numerous introductory essays to convex optimization and convex analysis, to name only a few (Boyd and Vandenberghe 2004; Nesterov 2004; Nemirovski and Yudin 1983; Nemirovski 2004; Borwein and Lewis 2006; Rockafellar 1997). The author fondly recommends the text from which he has learned about mathematical optimization theory (Nemirovski 2004). The more broad texts on machine learning are too numerous to state here.
The primary purpose of this is to serve as an educational textbook for a dedicated course on OCO and the convex optimization approach to machine learning. OCO has already had enough impact to appear in several surveys and introductory texts (Hazan 2011; Shalev-Shwartz 2011; Rakhlin 2009; Rakhlin and Sridharan 2014). We hope this compilation of material and exercises will further enrich the literature.
Books Structure
This book is intended to serve as a reference for a self-contained course for graduate students in computer science/electrical engineering/operations research/statistics and related fields. As such, its organization follows the structure of the course Decision Analysis, taught at the Technion, and later Theoretical Machine Learning, taught at Princeton University.
Each chapter should take one or two weeks of classes, depending on the depth and breadth of the intended course. Chapter 1 is designed to be a teaser for the field, and thus it is less rigorous than the rest of the book.
Roughly speaking, the book can be conceived of as three units. The first, from chapters 2 through 4, contains the basic definitions, framework, and core algorithms for OCO. Chapters 5 to 7 contain more advanced algorithms and in-depth analysis of the framework and its extensions to other computational and information access models. The rest of the book deals with more advanced algorithms, more difficult settings, and relationships to well-known machine learning paradigms.
This book can assist educators in designing a complete course on the topic of OCO, or it can serve as a component in a comprehensive course on machine learning. An accompanying manual of solutions to selected exercises given in the book is available for educators only.
New in the Second Edition
The main additions to the second edition of this book include the following:
- Expanded coverage of optimization in chapter 2, with a unified gradient descent analysis of the Polyak stepsize.
- Expanded coverage of learning theory in chapter 9, with an introduction to compression and its use in generalization theory.
- An expanded chapter 4, with discussion of the exponential weighted optimizer for exp-concave loss functions.
- A revised chapter 5, with the addition of mirror descent analysis, as well as a revised section on adaptive gradient methods.
- New chapter 10 on the notion of adaptive regret and algorithms for OCO with near-optimal adaptive regret bounds.
- New chapter 11 on boosting and its relationship to OCO. Derivation of boosting algorithms from regret minimization.
- New chapter 12 on online boosting.
- New chapter 13 on Blackwell approachability and its strong connection to OCO.
In addition, numerous typos are fixed, exercises are corrected, and solutions to several questions have been made available in a separate manual for educators.
Acknowledgments
First Edition
First of all, I gratefully acknowledge the numerous contributions and insights from the students of the course Decision Analysis, given at the Technion during 20102014, as well as the students of Theoretical Machine Learning, a course taught at Princeton University during 20152016.