(Expanded from the 1st Edition)
Stochastic approximation was introduced in a 1951 article in the Annals of Mathematical Statistics by Robbins and Monro. Originally conceived as a tool for statistical computation, an area in which it retains a place of pride, it has come to thrive in a totally different discipline, viz. that of engineering. The entire area of adaptive signal processing in communication engineering has been dominated by stochastic approximation algorithms and variants, as is evident from even a cursory look at any standard text on the subject. Then there are more recent applications to adaptive resource allocation problems in communication networks. In control engineering too, stochastic approximation is the main paradigm for online algorithms for system identification and adaptive control.
This is not accidental. The key word in most of these applications is adaptive. Stochastic approximation has several intrinsic traits that make it an attractive framework for adaptive schemes. It is designed for uncertain (read stochastic) environments, where it allows one to track the average or typical behavior of such an environment. It is incremental; i.e., it makes small changes in each step, which ensures a graceful behavior of the algorithm. This is a highly desirable feature of any adaptive scheme. Furthermore, it usually has low computational and memory requirements per iterate, another desirable feature of adaptive systems. Finally, it conforms to our anthropomorphic notion of adaptation: It makes small adjustments so as to improve a certain performance criterion based on feedback received from the environment.
For these very reasons, there has been a resurgence of interest in this class of algorithms in several new areas of engineering. One of these, viz. communication networks, is already mentioned above. Yet another major application domain has been artificial intelligence, where stochastic approximation has provided the basis for many learning or parameter tuning algorithms in machine learning. Notable among these are the algorithms for training neural networks and the algorithms for reinforcement learning, a popular learning paradigm for autonomous software agents with applications in e-commerce, robotics, etc.
Yet another fertile terrain for stochastic approximation has been in the area of economic theory, for reasons not entirely dissimilar to those mentioned above. On one hand, they provide a good model for collective phenomena, where micromotives (to borrow a phrase from Thomas Schelling) of individual agents aggregate to produce interesting macrobehavior. The nonlinear urn scheme analyzed by Arthur and others to model increasing returns in economics is a case in point. On the other hand, their incrementality and low per iterate computational and memory requirements make them an ideal model of a boundedly rational economic agent, a theme which has dominated their application to learning models in economics, notably to learning in evolutionary games.
This flurry of activity, while expanding the application domain of stochastic approximation, has also thrown up interesting new issues, some of them dictated by technological imperatives. Consequently, it has spawned interesting new theoretical developments as well. When the previous edition of the book was written, the time seemed ripe for a book pulling together old and new developments in the subject with an eye on the aforementioned applications. There are, indeed, several excellent texts already in existence, many of which will be referenced later in this book. But they tend to be comprehensive texts: excellent for the already initiated, but rather intimidating for someone who wants to make quick inroads, hence a need for a bite-sized text. The first edition of this book was an attempt at one, though it did stray into a few esoteric themes in some places.
Having decided to write a book, there was still a methodological choice. Stochastic approximation theory has two somewhat distinct strands of research. One, popular with statisticians, uses the techniques of martingale theory and associated convergence theorems for analysis. The second, popular more with control engineers, treats the algorithm as a noisy discretization of an ordinary differential equation and analyzes it as such. The latter approach was preferred, because the kind of intuition that it offers is an added advantage in many of the engineering applications.
Of course, this was not the first book expounding this approach. There are several predecessors such as the excellent texts by BenvenisteMetivierPriouret, Duflo, and KushnerYin referenced later in the book. These are, however, what we have called