inside front cover
Chaos Engineering
Site reliability through controlled disruption
Mikolaj Pawlikowski
Forewords by Casey Rosenthal and Dave Rensin
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
manning.com
Copyright
For online information and ordering of these and other Manning books, please visit manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
2021 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Mannings policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
| Manning Publications Co. 20 Baldwin Road Technical PO Box 761 Shelter Island, NY 11964 |
Development editor: | Toni Arritola |
Technical development editor: | Nick Watts |
Review editor: | Mihaela Batini |
Production editor: | Deirdre S. Hiam |
Copy editor: | Sharon Wilkey |
Proofreader: | Melody Dolab |
Technical proofreader: | Karsten Strbk |
Typesetter: | Dennis Dalinnik |
Cover designer: | Marija Tudor |
ISBN: 9781617297755
dedication
To my father, Maciej, who always had this inexplicable faith in my abilities.
I miss you, man.
front matter
foreword
As is often the case with new and technical areas, Chaos Engineering is a simple title for a rich and complex topic. Many of its principles and practices are counterintuitivestarting with its namewhich makes it doubly challenging to explain. The early days of a new topic, however, are precisely the time when we need to find and distribute the easy-to-understand explanations.
Im very pleased to say this book does exactly that.
An oft repeated scientific dictum is that if you cant explain it simply, then you dont really understand it. I can safely say to you that Mikolaj clearly understands chaos engineering because in these pages he explains its principles and practices with a simplicity and practical use that is uncommon for technical books.
This, however, brings us to the main question. Why on earth would any reasonable person want to introduce chaos into their systems? Things are complicated enough already in our lives, so why go looking for trouble?
The short answer is that if you dont look for trouble, you wont be prepared when it comes looking for you. And eventually, trouble comes looking for all of us.
Testingat least as we have all understood the termwill not be of much help. A test is an activity you run to make sure that your system behaves in a way that you expect under a specific set of conditions.
The biggest source of trouble, however, is not from the conditions we were expecting, but from the conditions that never occurred to us. No amount of testing will save us from emergent properties and behaviors. For that, we need something new.
We need chaos engineering.
If this is your first book on chaos engineering, you have chosen wisely. If not, then take solace in the fact that you are about to begin a journey that will fill in the gaps of your understanding and help you glue it all together in your mind.
When you are finished, you will feel more comfortable (and excited) about applying chaos engineering to your systems, and probably more than a little anxious about what you will find.
I am very pleased to have been invited to write these words and grateful to have a book like this available the next time someone asks me, What is chaos engineering?
David K. Rensin, Google
foreword
If Miko didnt write this book, someone else would have to. That said, it would be difficult to find someone with Mikos history and experience with chaos engineering to put such a practical approach into writing. His background with distributed systems and particularly the critical and complex systems at Bloomberg, combined with his years of work on PowerfulSeal, give him a unique perspective. Not many people have the time and skill of working in the trenches on chaos engineering at an enterprise level.
This perspective is apparent in Mikos pragmatic approach. Throughout the chapters, we see a recurring theme that ties back to the value proposition of doing chaos engineering in the first place: risk and contract verification, holistic assessment of an entire system, and discovery of emergent properties.
One of the most common questions we hear with respect to chaos engineering is Is it safe? The second question is usually How do I get started with chaos engineering? Miko brilliantly answers both by including a virtual machine (VM) with all the examples and code used in the book. Anyone with basic knowledge of running an application can ease into common and then more advanced chaos engineering scenarios. Mess something up? No worries! Just turn off the VM and reload a new copy. You can now get started with chaos engineering, and do so safely, as Miko facilitates your learning journey from basic service outages (killing processes) to cache and database issues through OS- and application-level experiments, being mindful of the blast radius all the while.
Along the way, youll get introduced to more advanced topics in system analysis, like the sections on Berkeley Packet Filter (BPF), sar
, strace
, and tcptop
even virtual machines and containers. Beyond just chaos engineering, this book is a broad education in SRE and DevOps practices.
The book provides examples of chaos engineering experiments across the application layer, at the operating system level, into containers, on hardware resources, on the network, and even in a web browser. Each of these areas alone is worthy of an entire chapter, if not book; you get the benefit of exploring the full breadth of possible experiments with an experienced facilitator to guide you through. Miko hits different ways each area can be affected in just the right level of detail to give you confidence to try it yourself in your own stack.