Praise for Implementing Service Level Objectives
SLIs and SLOs are core practices of the discipline of SRE, but theyre trickier than they look. Alex and his merry band of SRE luminaries have a metric ton of experience and are here to help.
David N. Blank-Edelman, Curator/Editor of Seeking SRE and Cofounder of SREcon
Practical examples of software reliability are hard to come by, but this book has done it...A must-read for ensuring that your end users are happy and successful.
Robert Ross, CEO at FireHydrant
An approachable, clear guide that enables normal companies to achieve Google SRE quality monitoring. I cant recommend this book enough!
Thomas A. Limoncelli, SRE Manager, Stack Overflow, Inc.
Implementing Service Level Objectives
by Alex Hidalgo
Copyright 2020 Alex Hidalgo. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Acquisitions Editor: John Devins
- Development Editor: Corbin Collins
- Production Editor: Deborah Baker
- Copyeditor: Rachel Head
- Proofreader: Piper Editorial, LLC
- Indexer: nSight, Inc.
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: OReilly Media, Inc.
- September 2020: First Edition
Revision History for the First Edition
- 2020-08-04: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492076810 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Implementing Service Level Objectives, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-492-07681-0
[GP]
Foreword
Reliability is a conversation.
It is a conversation we have with our infrastructure, our systems, and our services as we attempt to operate them. It is a conversation we have with complexity, security, scalability, and velocity in the hopes they will emerge in the way we need them. It is a conversation we have with ethics, privacy, and justice as we attempt to do the right thing for the people who depend on us. And finally, it is a conversation we have with our colleagues so we can work together to build what matters to us.
If there is anything the world needs right now (either the now of this writing or the now when you are reading this), it is better conversations. They are not easy. We can use all of the help we can get with them.
And thats where SLIs and SLOs come into the picture. For me, they offer a tool, a practice, a modelwhatever you want to call itfor having better reliability conversations. Conversations that put the humans first. SLIs and SLOs help us think about, communicate, and interact with reliability in a new way. They arent actors scripts for some David Mamet-esque play telling us exactly what to saywhere to speak or where to put the pauses. Im pretty sure we wouldnt want them if they were.
Instead, SLIs and SLOs give us a little guidance when we need it. Hmm, your customers latency might be a bit better if you zigged there instead of zagging or Are you sure you want to deploy a new version now? or Oh, so thats what is important to our users; maybe wed better start paying attention to that... And given all of the different conversations mentioned earlier that we are responsible for navigating, this guidance is gold.
If we were to play the old good news/bad news game, the bad news is just as conversations about reliability can be hard at times, conversations about conversations about reliability can be less straightforward than wed like. SLIs and SLOs in theory are potentially easy, but in practice, not always so much.
The other piece of soggy news is that just as reliability conversations (at least the good ones) never really end, so too it is with SLIs and SLOs. They dont finish. As Rilke said: Live the questions now.
The good (albeit slightly less poetic) news is that you have this book. Alex and the other contributors have already lived some of the questions, and they are ready to share what theyve learned with you. This can help you mine the gold and get the most from what SLIs and SLOs have to offer.
I dont want to keep you any longer from reading the rest of this book, but I will use up the free Dear reader card you get when you agree to write a book foreword:
Dear reader, please use all of the advice in this book (and any other tool you encounter) to have better conversations. Im counting on you.
David N. Blank-Edelman
Curator/Editor of Seeking SRE
and Cofounder of SREcon
Preface
On the surface, this book is about service level objectives (SLOs). But on a deeper level, this book is about people. All the theory, philosophy, and approaches outlined in the pages that follow really only exist to make peoples lives easier, and therefore hopefully better.
Were going to be discussing a lot of topics. Some of them are going to be fairly philosophical, and some will be heavy with math and formulae. Some will focus on software, while others will focus on processes. But all of them are ultimately about people, and I want to start with a true story about that.
You Dont Have to Be Perfect
Shortly after agreeing to write this book, I was getting a haircut in New York City. My stylist was someone who had lived in Richmond, Virginia, at the same time I had. We never met while we lived there, but we quickly realized we both used to hang out at all the same places. It only took us about three minutes before we realized we also knew many of the same people. We hit it off immediately.
Molly is a great stylist, and I always had a good time catching up while getting my hair cut. The haircut relevant to this story, however, was the last one Id get from her before she moved away from New York to open a coffee shop in Detroit. I had built so much trust in her, I didnt find another stylist or get another haircut for four months.
During this final haircut, I told her that I had signed a book deal, and she asked me to tell her what it was about. So I laid it out in the same simple terms I do in the first chapter: you cant be perfect, no one needs you to be perfect anyway, its too expensive to try to be perfect, and everyone is really happier at the end of the day if you accept those facts.
She responded with an anecdote of her own. When she first started cutting hair, she was so focused on making sure everything was absolutely perfect that it would sometimes take her an hour to do a simple mens haircut that should have taken 30 minutes. Cosmetology school had ingrained in her that everything had to be as even and measured as possible. Trying to be perfect caused too many haircuts to run over in time, which upset the clients that had later appointments booked.