• Complain

Jurg van Vliet - Resilience and Reliability on AWS

Here you can read online Jurg van Vliet - Resilience and Reliability on AWS full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2013, publisher: OReilly Media, genre: Computer / Science. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Jurg van Vliet Resilience and Reliability on AWS

Resilience and Reliability on AWS: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Resilience and Reliability on AWS" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Cloud services are just as susceptible to network outages as any other platform. This concise book shows you how to prepare for potentially devastating interruptions by building your own resilient and reliable applications in the public cloud. Guided by engineers from 9appsan independent provider of Amazon Web Services and Eucalyptus cloud solutionsyoull learn how to combine AWS with open source tools such as PostgreSQL, MongoDB, and Redis.This isnt a book on theory. With detailed examples, sample scripts, and solid advice, software engineers with operations experience will learn specific techniques that 9apps routinely uses in its cloud infrastructures.Build cloud applications with the rip, mix, and burn approach Get a crash course on Amazon Web Services Learn the top ten tips for surviving outages in the cloud Use elasticsearch to build a dependable NoSQL data store Combine AWS and PostgreSQL to build an RDBMS that scales well Create a highly available document database with MongoDB Replica Set and SimpleDB Augment Redis with AWS to provide backup/restore, failover, and monitoring capabilities Work with CloudFront and Route 53 to safeguard global content delivery

Jurg van Vliet: author's other books


Who wrote Resilience and Reliability on AWS? Find out the surname, the name of the author of the book and a list of all author's works by series.

Resilience and Reliability on AWS — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Resilience and Reliability on AWS" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Resilience & Reliability on AWS
Jurg van Vliet
Flavia Paganelli
Jasper Geurtsen
Published by OReilly Media, Inc.
Foreword
Jeremy Edberg, Information Cowboy, December 2012

In mid-2008, I was handling operations for reddit.com, an online community for sharing and discussing links, serving a few tens of millions of page views per month. At the time, we were hosting thewhole site on 21 1U HP servers (in addition to four of the originalservers for the site) in two racks in a San Francisco data center.Around that time, Steve, one of the founders of reddit, came to me andsuggested I check out this AWS thing that his buddies at Justin.tv hadbeen using with some success; he thought it might be good for us, too.I set up a VPN; we copied over a set of our data, and started using itfor batch processing.

In early 2009, we had a problem: we needed more servers for livetraffic, so we had to make a choicebuild out another rack ofservers, or move to AWS. We chose the latter, partly because we didntknow what our growth was going to look like, and partly because itgave us enormous flexibility for resiliency and redundancy by offeringmultiple availability zones, as well as multiple regions if we ever got to that point. Also, I was tired of running to the data center every time a disk failed, a fan died, a CPU melted, etc.

When designing any architecture, one of the first assumptions oneshould make is that any part of the system can break at any time.AWS is no exception. Instead of fearing this failure, one must embraceit. At reddit, one of the things we got right with AWS from the startwas making sure that we had copies of our data in at least two zones.This proved handy during the great EBS outage of 2011. While we weredown for a while, it was for a lot less time than most sites, in largepart because we were able to spin up our databases in the other zone,where we kept a second copy of all of our data. If not for that, wewould have been down for over a day, like all the other sites in thesame situation.

During that EBS outage, I, like many others, watched Netflix, alsohosted on AWS. It is said that if youre on AWS and your site isdown, but Netflix is up, its probably your fault you are down. Itwas that reputation, among other things, that drew me to move fromreddit to Netflix, which I did in July 2011. Now that Im responsiblefor Netflixs uptime, it is my job to help the company maintain thatreputation.

Netflix requires a superior level of reliability. With tens ofthousands of instances and 30 million plus paying customers,reliability is absolutely critical. So how do we do it? We expectthe inevitable failure, plan for it, and even cause it sometimes. AtNetflix, we follow our monkey theorywe simulate things that gowrong and find things that are different. And thus was born the SimianArmy, our collection of agents that constructively muck with our AWSenvironment to make us more resilient to failure.

The most famous of these is the Chaos Monkey, which kills randominstances in our production accountthe same account that servesactual, live customers. Why wait for Amazon to fail when you caninduce the failure yourself, right? We also have the Latency Monkey,which induces latency on connections between services to simulatenetwork issues. We have a whole host of other monkeys too (most ofthem available on Github).

The point of the Monkeys is to make sure we are ready for any failuremodes. Sometimes it works, and we avoid outages, and sometimes newfailures come up that we havent planned for. In those cases, ourresiliency systems are truly tested, making sure they are generic andbroad enough to handle the situation.

One failure that we werent prepared for was in June 2012. A severestorm hit Amazons complex in Virginia, and they lost power to one oftheir data centers (a.k.a. Availability Zones). Due to a bug in themid-tier load balancer that we wrote, we did not route traffic awayfrom the affected zone, which caused a cascading failure. Thisfailure, however, was our fault, and we learned an important lesson.This incident also highlighted the need for the Chaos Gorilla, whichwe successfully ran just a month later, intentionally taking out anentire zones worth of servers to see what would happen (everythingwent smoothly). We ran another test of the Chaos Gorilla a few monthslater and learned even more about what were are doing right and where wecould do better.

A few months later, there was another zone outage, this time due tothe Elastic Block Store. Although we generally dont use EBS, many ofour instances use EBS root volumes. As such, we had to abandon anavailability zone. Luckily for us, our previous run of Chaos Gorillagave us not only the confidence to make the call to abandon a zone,but also the tools to make it quick and relatively painless.

Looking back, there are plenty of other things we could have done tomake reddit more resilient to failure, many of which I have learnedthrough ad hoc trial and error, as well as from working at Netflix. Unfortunately, I didnt have a book like this one to guide me. Thisbook outlines in excellent detail exactly how to build resilientsystems in the cloud. From the crash course in systems to thedetailed instructions on specific technologies, this book includesmany of the very same things we stumbled upon as we flailed wildly,discovering solutions to problems. If I had had this book when I wasfirst starting on AWS, I would have saved myself a lot of time andheadache, and hopefully you will benefit from its knowledge afterreading it.

This book also teaches a very important lesson: to embrace andexpect failure, and if you do, you will be much better off.

Preface

Thank you (again) for picking up one of our books! If you have read Programming Amazon EC2, you probably have some expectations about this book.

The idea behind this book came from Mike Loukides, one of our editors. He was fascinated with the idea of resilience and reliability in engineering. At the same time, Amazon Web Services (AWS) had been growing and growing.

As is the case for other systems, AWS does not go without service interruptions. The underlying architecture and available services are designed to help you deal with this. But as outages have shown, this is difficult, especially when you are powering the majority of the popular web services.

So how do we help people prepare? We already have a good book on the basics of engineering on AWS. But it deals with relatively simple applications, solely comprised of AWSs infrastructural components. What we wanted to show is how to build service components yourself and make them resilient and reliable.

The heart of this book is a collection of services we run in our infrastructures. Well show things like Postgres and Redis, but also elasticsearch and MongoDB. But before we talk about these, we will introduce AWS and our approach to Resilience and Reliability.

We want to help you weather the next (AWS) outage!

Audience

If Amazon Web Services is new to you, we encourage you to pick up a copy of Programming Amazon EC2 . Familiarize yourself with the many services AWS offers. It certainly helps to have worked (or played) with many of them.

Even though many of our components are nothing more than a collection of scripts (bash, Python, Ruby, PHP) dont be fooled. The lack of a development environment does not make it easier to engineer your way out of many problems.

Therefore, we feel this book is probably well-suited for software engineers. We use this term inclusivelynot every programmer is a software engineer, and many system administrators are software engineers. But you at least need some experience building complex systems. It helps to have seen more than one programming language. And it certainly helps to have been responsible for operations.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Resilience and Reliability on AWS»

Look at similar books to Resilience and Reliability on AWS. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Resilience and Reliability on AWS»

Discussion, reviews of the book Resilience and Reliability on AWS and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.