Preface
Ive been fortunate to get hired into medium-sized operations teams at large technology companies. All ops teams (a customary term for operations teams) share two interesting characteristics: compared to other engineering departments, they work under more pressure, and they attract bad attention much easier than good attention. Digital firefighting is the nature of the job. We might get noticed when things go awry and we fix them. If we dont react fast enough, we definitely get noticed. If you know anyone in network operations, ask if thats the way he or she feels about the jobI bet youre going to get an answer along those lines.
Working in ops is all about effectiveness: there is no time for re-engineering. We must get things right the first time and we have to act fast. We go through a lot of reprioritizing and context-switching. There is relatively little room for creativity, at least the kind that doesnt love constraints. All this makes operations a great place to learn and grow.
This book is based on experiences of working in ops. I was extremely lucky to work with some of the smartest people in the industry. I would like this book to be a tribute to all these invisible ops guys who struggle daily to maintain the highest standards of service availability.
In my career, Ive stared at all sorts of timeseries plots, a lot of them. At one point it was my full-time jobno kidding. With time, I learned to extract meaning from data point fluctuations just by a brief glance, without having to study their origin. Its a funny kind of intuition that system engineers develop in the course of their jobs, and one that probably saves us a lot of time. Some of us are unaware of it, and its definitely not something we brag about. It is a very useful skill, nevertheless, and in this book I attempt to verbalize it in order to assist you, dear Reader, to absorb it in a more conscious way than I did, possibly saving you weeks or months of getting up to speed.
Some people on my team believed that putting in motion the ideas described here led to a visible paradigm shift. I must agree that in a relatively short period of time, the work caused by our alerting configuration went from mundane to effortless.
This book focuses on monitoring and alerting in the context of distributed information systems, but Im hoping that the principles presented here will also be applicable to timeseries and datasets generated by all sorts of complex systems. The book does not focus on any particular software package. Rather, it attempts to extract and summarize regularities that system engineers come across in their daily work. You wont find many long code listings here, but youll definitely find ideas: ones that I hope youll be able to relate to and apply either at work or in a research project.
Enjoy!
Who Should Read This Book
The main audience of this book are system operators, those who fight the daily battle of delivering the best performance at lowest cost as well as those who use monitoring as a means and not an end. Read it if you work extensively with monitoring and plan alerting configurations. If keeping high availability and continuity of service is your job, read on. If monitoring and alerting bring up unpleasant associations, thats an even more valid reason to read the book. If youre trying to quantify the effectiveness of your alerting configurations, the book might have good answers.
Administrators who are setting up a monitoring or alerting configuration with a potential to grow big might also find the book useful. The ideas presented here have been tested on large alerting configurations with a high degree of success. By large, I mean thousands of monitors and hundreds of alarms. The book should help you replicate this setup in your environment.
Conventions Used in This Book
The following typographical conventions are used in this book:
ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
Tip
This icon signifies a tip, suggestion, or general note.
Caution
This icon indicates a warning or caution.
Using Code Examples
This book is here to help you get your job done. In general, if this book includes code examples, you may use the code in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: Effective Monitoring and Alerting by Slawek Ligus (OReilly). Copyright 2013 Slawek Ligus, 978-1-449-33352-2.
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .
Safari Books Online
Note
Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the worlds leading authors in technology and business.
Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.
Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like OReilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
OReilly Media, Inc. |
1005 Gravenstein Highway North |
Sebastopol, CA 95472 |
800-998-9938 (in the United States or Canada) |