Dedicated to my friends, who helped me to understand a little over the years.
Foreword
Over the years Ive noticed that many people working on performance, capacity planning, and configuration of large scale systems think differently, and are particularly effective because they have a background in physics. Personally I have a degree in Applied Physics and enjoy the way Mark introduces scientific concepts, and applies them to computer systems, in this book. Unlike many books in the general area of systems administration, which are tied to specific tools or platforms, and which date over time, the fundamental scientific concepts described here arent going to change. As time passes, more people are finding the path that leads in search of certainty to the ideas around promise theory, andover timeI believe this book will be seen as an important landmark in the development of our craft.
Mark is both a practitioner and an academic, and he has brought rigor, clarity and a deep understanding of the physics of complex systems to the world of systems administration. Where the state of the art used to be type in the commands in the installation manual or run book by hand, he built CFEngine to repeatably automate configuration steps, and inspired several generations of tools that create infrastructure with code. Moving on from hack it until it appears to work and dont touch it until it breaks again, he implemented the idea that a large collection of systems should actively maintain a desired configuration. Now, with this updated second edition of In Search of Certainty, he sets out a physics-based foundation for reasoning about the complex state of large-scale distributed systems. By inverting the concept of externalized service level agreement obligations into decentralized local promises, Promise Theory provides a scalable and robust approach to systems management. There is no such thing as certainty in a distributed system, but there are good and bad approximations to certainty, and this book is a valuable guide to those of us who want to build large scale distributed systems that promise to behave themselves most of the time.
In 2009 and 2010 I was lucky to be part of a very experienced team of managers and engineers at Netflix who went back to first principles to come up with an architecture that would be very agile, automated, self service, highly available, and cloud based. We internalized the principles that Amazon Web Services (AWS) was promoting, and synthesized ideas from our backgrounds at Google, eBay, Yahoo and Sun, along with anti-patterns from our experience at Netflix that we wanted to avoid. One of the recurring problems in our data-center deployment was that individual machine failures could break the service. In addition intermittent problems were traced to configuration drift, where supposedly identical installations werent. Our solution for this was to move up a level, and ban individual snowflake or pet machines. Instead, any change we made was baked into a machine image and replicated into a herd of identical instances. To ensure that these instances remained ephemeral, with immutable code, and didnt maintain local state between requests, we also created a chaos monkey process that would delete them from time to time, and use the AWS auto-scaler to replace them automatically. In effect, in our own search for certainty, we were able to make strong assertions, or promises, that a group of machines were always executing identical code, and immunize the group so that it would self-repair.
In 2015 many of the radical ideas and patterns we argued about five years ago have become well established and now have names. Using a DevOps organization to deliver auto-scaled microservices, based on immutable containers, is becoming a common pattern. As we model and reason about the behavior of large-scale collections of microservices, this book on certainty, and how promise theory came about, is an important addition to our bookshelf.
Adrian Cockcroft, Los Gatos 2015.
Author preface to second edition
In preparing the second edition of the book, I have made as few changes as possible to the text. My goal is not to update or correct it, as much as to repair some deficiencies in the first attempt at explaining an intricate topic. I prefer that the book remain a cultural document, a child of its time, however naive that might prove to be in the long run.
Some obvious points failed to come through the text at first attempt, especially with regard to material structure of infrastructure in . I have also added a summary of points from each chapter at the end of the book, as a comprehension aid to all readers, and perhaps as a study aid for college students.
This book is an ambitious project, and there is a lot to swallow in its pages. However, like the books that were just a little beyond my reach (and hence inspired me to learn) as a teenager, I hope that this will be a book readers can revisit multiple times to discover new insights, over several years. No one can absorb the entirety of such a story in one sitting.