LitArk » Books » Computer

Betsy Beyer (editor) - Site Reliability Engineering: How Google Runs Production Systems

Here you can read online Betsy Beyer (editor) - Site Reliability Engineering: How Google Runs Production Systems full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2016, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Site Reliability Engineering: How Google Runs Production Systems
Author:
Betsy Beyer editor / Chris Jones editor / Jennifer Petoff editor / Niall Richard Murphy editor / Kavita Guliani editor / Carmela Quinito editor / Benjamin Treynor Sloss / JC van Winkel / Marc Alvidrez / Mark Roth / Cody Smith / John Wilkes
Genre:
Books / Computer
Year:
2016
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Site Reliability Engineering: How Google Runs Production Systems: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Site Reliability Engineering: How Google Runs Production Systems" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Betsy Beyer (editor): author's other books

Who wrote Site Reliability Engineering: How Google Runs Production Systems? Find out the surname, the name of the author of the book and a list of all author's works by series.

Site Reliability Engineering: How Google Runs Production Systems — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Site Reliability Engineering: How Google Runs Production Systems" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Site Reliability Engineering

Foreword

Google's story is a story of scaling up. It is one of the great success stories of the computing industry, marking a shift towards IT-centric business. Google was one of the first companies to define what business-IT alignment meant in practice, and went on to inform the concept of DevOps for a wider IT community. This book has been written by a broad cross-section of the very people who made that transition a reality.

Google grew at a time when the traditional role of the system administrator was being transformed. It questioned system administration, as if to say: we can't afford to hold tradition as an authority, we have to think anew, and we don't have time to wait for everyone else to catch up. In the introduction to Principles of Network and System Administration , I claimed that system administration was a form of human-computer engineering. This was strongly rejected by some reviewers, who said "we are not yet at the stage where we can call it engineering." At the time, I felt that the field had become lost, trapped in its own wizard culture, and could not see a way forward. Then, Google drew a line in the silicon, forcing that fate into being. The revised role was called SRE, or Site Reliability Engineer. Some of my friends were among the first of this new generation of engineer; they formalized it using software and automation. Initially, they were fiercely secretive, and what happened inside and outside of Google was very different: Google's experience was unique. Over time, information and methods have flowed in both directions. This book shows a willingness to let SRE thinking come out of the shadows.

Here, we see not only how Google built its legendary infrastructure, but also how it studied, learned, and changed its mind about the tools and the technologies along the way. We, too, can face up to daunting challenges with an open spirit. The tribal nature of IT culture often entrenches practitioners in dogmatic positions that hold the industry back. If Google overcame this inertia, so can we.

This book is a collection of essays by one company, with a single common vision. The fact that the contributions are aligned around a single company's goal is what makes it special. There are common themes, and common characters (software systems) that reappear in several chapters. We see choices from different perspectives, and know that they correlate to resolve competing interests. The articles are not rigorous, academic pieces; they are personal accounts, written with pride, in a variety of personal styles, and from the perspective of individual skill sets. They are written bravely, and with an intellectual honesty that is refreshing and uncommon in industry literature. Some claim "never do this, always do that," others are more philosophical and tentative, reflecting the variety of personalities within an IT culture, and how that too plays a role in the story. We, in turn, read them with the humility of observers who were not part of the journey, and do not have all the information about the myriad conflicting challenges. Our many questions are the real legacy of the volume: Why didn't they do X? What if they'd done Y? How will we look back on this in years to come? It is by comparing our own ideas to the reasoning here that we can measure our own thoughts and experiences.

The most impressive thing of all about this book is its very existence. Today, we hear a brazen culture of "just show me the code." A culture of "ask no questions" has grown up around open source, where community rather than expertise is championed. Google is a company that dared to think about the problems from first principles, and to employ top talent with a high proportion of PhDs. Tools were only components in processes, working alongside chains of software, people, and data. Nothing here tells us how to solve problems universally, but that is the point. Stories like these are far more valuable than the code or designs they resulted in. Implementations are ephemeral, but the documented reasoning is priceless. Rarely do we have access to this kind of insight.

This, then, is the story of how one company did it. The fact that it is many overlapping stories shows us that scaling is far more than just a photographic enlargement of a textbook computer architecture. It is about scaling a business process, rather than just the machinery. This lesson alone is worth its weight in electronic paper.

We do not engage much in self-critical review in the IT world; as such, there is much reinvention and repetition. For many years, there was only the USENIX LISA conference community discussing IT infrastructure, plus a few conferences about operating systems. It is very different today, yet this book still feels like a rare offering: a detailed documentation of Googles step through a watershed epoch. The tale is not for copyingthough perhaps for emulatingbut it can inspire the next step for all of us. There is a unique intellectual honesty in these pages, expressing both leadership and humility. These are stories of hopes, fears, successes, and failures. I salute the courage of authors and editors in allowing such candor, so that we, who are not party to the hands-on experiences, can also benefit from the lessons learned inside the cocoon.

Mark Burgess

Preface

Software engineering has this in common with having children: the labor before the birth is painful and difficult, but the labor after the birth is where you actually spend most of your effort. Yet software engineering as a discipline spends much more time talking about the first period as opposed to the second, despite estimates that 4090% of the total costs of a system are incurred after birth. The popular industry model that conceives of deployed, operational software as being stabilized in production, and therefore needing much less attention from software engineers, is wrong. Through this lens, then, we see that if software engineering tends to focus on designing and building software systems, there must be another discipline that focuses on the whole lifecycle of software objects, from inception, through deployment and operation, refinement, and eventual peaceful decommissioning. This discipline usesand needs to usea wide range of skills, but has separate concerns from other kinds of engineers. Today, our answer is the discipline Google calls Site Reliability Engineering.

So what exactly is Site Reliability Engineering (SRE)? We admit that its not a particularly clear name for what we dopretty much every site reliability engineer at Google gets asked what exactly that is, and what they actually do, on a regular basis.

Unpacking the term a little, first and foremost, SREs are engineers. We apply the principles of computer science and engineering to the design and development of computing systems: generally, large distributed ones. Sometimes, our task is writing the software for those systems alongside our product development counterparts; sometimes, our task is building all the additional pieces those systems need, like backups or load balancing, ideally so they can be reused across systems; and sometimes, our task is figuring out how to apply existing solutions to new problems.

Next, we focus on system reliability. Ben Treynor Sloss, Googles VP for 24/7 Operations, originator of the term SRE, claims that reliability is the most fundamental feature of any product: a system isnt very useful if nobody can use it! Because reliability

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Site Reliability Engineering: How Google Runs Production Systems»

Look at similar books to Site Reliability Engineering: How Google Runs Production Systems. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Mikolaj Pawlikowski

Chaos Engineering

Henry Shu-hung Chung (editor)

Reliability of Power Electronic Converter Systems (Energy Engineering)

Rathnakar Achary (editor)

Cloud Reliability Engineering: Technologies and Tools

Charity Majors

Observability Engineering: Achieving Production Excellence

Coen van Gulijk

Reliability Engineering and Computational Intelligence

Shamayel Mohammed Farooqui

Hands-on Site Reliability Engineering

Brecher Christian

Integrative Production Technology Theory and Applications

Beyer Betsy

The site reliability workbook: practical ways to implement SRE

Betsy Beyer

Site Reliability Engineering

Heather Adkins

Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems

David N. Blank-Edelman

Seeking SRE: Conversations About Running Production Systems at Scale

Betsy Beyer Chris Jones Jennifer Petoff

Site Reliability Engineering

Reviews about «Site Reliability Engineering: How Google Runs Production Systems»

Discussion, reviews of the book Site Reliability Engineering: How Google Runs Production Systems and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.