About This eBook
ePUB is an open, industry-standard format for eBooks. However, support of ePUB and its many features varies across reading devices and applications. Use your device or app settings to customize the presentation to your liking. Settings that you can customize often include font, font size, single or double column, landscape or portrait mode, and figures that you can click or tap to enlarge. For additional information about the settings and features on your reading device or app, visit the device manufacturers Web site.
Many titles include programming code or configuration examples. To optimize the presentation of these elements, view the eBook in single-column, landscape mode and adjust the font size to the smallest setting. In addition to presenting code and configurations in the reflowable text format, we have included images of the code that mimic the presentation found in the print book; therefore, where the reflowable format may compromise the presentation of the code listing, you will see a Click here to view code image link. Click the link to view the print-fidelity code image. To return to the previous page viewed, click the Back button on your device or app.
The Practice of Cloud System Administration
Designing and Operating Large Distributed Systems
Volume 2
Thomas A. Limoncelli
Strata R. Chalup
Christina J. Hogan
![Upper Saddle River NJ Boston Indianapolis San Francisco New York Toronto - photo 1](/uploads/posts/book/63735/images/00001.jpeg)
Upper Saddle River, NJ Boston Indianapolis San Francisco
New York Toronto Montreal London Munich Paris Madrid
Capetown Sydney Tokyo Singapore Mexico City
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.
The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.
For information about buying this title in bulk quantities, or for special sales opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding interests), please contact our corporate sales department at or (800) 382-3419.
For government sales inquiries, please contact .
For questions about sales outside the United States, please contact .
Visit us on the Web: informit.com/aw
Library of Congress Cataloging-in-Publication Data
Limoncelli, Tom.
The practice of cloud system administration : designing and operating large distributed systems /
Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan.
volumes cm
Includes bibliographical references and index.
ISBN-13: 978-0-321-94318-7 (volume 2 : paperback)
ISBN-10: 0-321-94318-X (volume 2 : paperback)
1. Computer networksManagement. 2. Computer systems. 3. Cloud computing. 4. Electronic data
processingDistributed processing. I. Chalup, Strata R. II. Hogan, Christina J. III. Title.
TK5105.5.L529 2015
004.6782068dc23 2014024033
Copyright 2015 Thomas A. Limoncelli, Virtual.NET Inc., Christina J. Lear ne Hogan
All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to (201) 236-3290.
ISBN-13: 978-0-321-94318-7
ISBN-10: 0-321-94318-X
Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana.
First printing, September 2014
Contents at a Glance
Contents
Preface
Which of the following statements are true?
The most reliable systems are built using cheap, unreliable components.
The techniques that Google uses to scale to billions of users follow the same patterns you can use to scale a system that handles hundreds of users.
The more risky a procedure is, the more you should do it.
Some of the most important software features are the ones that users never see.
You should pick random machines and power them off.
The code for every feature Facebook will announce in the next six months is probably in your browser already.
Updating software multiple times a day requires little human effort.
Being oncall doesnt have to be a stressful, painful experience.
You shouldnt monitor whether machines are up.
Operations and management can be conducted using the scientific principles of experimentation and evidence.
Google has rehearsed what it would do in case of a zombie attack.
All of these statements are true. By the time you finish reading this book, youll know why.
This is a book about building and running cloud-based services on a large scale: internet-based services for millions or billions of users. That said, every day more and more enterprises are adopting these techniques. Therefore, this is a book for everyone.
The intended audience is system administrators and their managers. We do not assume a background in computer science, but we do assume experience with UNIX/Linux system administration, networking, and operating system concepts.
Our focus is on building and operating the services that make up the cloud, not a guide to using cloud-based services.
Cloud services must be available, fast, and secure. At cloud scale, this is a unique engineering feat. Therefore cloud-scale services are engineered differently than your typical enterprise service. Being available is important because the Internet is open 24 7 and has users in every time zone. Being fast is important because users are frustrated by slow services, so slow services lose out to faster rivals. Being secure is important because, as caretakers of other peoples data, we are duty-bound (and legally responsible) to protect peoples data.
These requirements are intermixed. If a site is not secure, by definition, it cannot be made reliable. If a site is not fast, it is not sufficiently available. If a site is down, by definition, it is not fast.
The most visible cloud-scale services are web sites. However, there is a huge ecosystem of invisible internet-accessible services that are not accessed with a browser. For example, smartphone apps use API calls to access cloud-based services.
For the remainder of this book we will tend to use the term distributed computing rather than cloud computing. Cloud computing is a marketing term that means different things to different people. Distributed computing describes an architecture where applications and services are provided using many machines rather than one.
This is a book of fundamental principles and practices that are timeless. Therefore we dont make recommendations about which specific products or technologies to use. We could provide a comparison of the top five most popular web servers or NoSQL databases or continuous build systems. If we did, then this book would be out of date the moment it is published. Instead, we discuss the qualities one should look for when selecting such things. We provide a model to work from. This approach is intended to prepare you for a long career where technology changes over time but you are always prepared. We will, of course, illustrate our points with specific technologies and products, but not as an endorsement of those products and services.