Edge Cloud Operations: A Systems Approach
Peterson, Baker, Bavier, Williams and Davie
Table of Contents
Foreword
Preface
Intended Audience
Guided Tour of Open Source
Acknowledgements
Chapter 1: Introduction
1.1 Terminology
1.2 Disaggregation
1.3 Cloud Technology
1.4 Future of the Sysadmin
Chapter 2: Architecture
2.1 Edge Cloud
2.2 Hybrid Cloud
2.3 Stakeholders
2.4 Control and Management
2.5 DevOps
Chapter 3: Resource Provisioning
3.1 Physical Infrastructure
3.2 Infrastructure-as-Code
3.3 Platform Definition
Chapter 4: Lifecycle Management
4.1 Design Overview
4.2 Testing Strategy
4.3 Continuous Integration
4.4 Continuous Deployment
4.5 Versioning Strategy
4.6 Managing Secrets
4.7 What about GitOps?
Chapter 5: Runtime Control
5.1 Design Overview
5.2 Implementation Details
5.3 Modeling Connectivity Service
5.4 Revisiting GitOps
Chapter 6: Monitoring and Telemetry
6.1 Metrics and Alerts
6.2 Logging
6.3 Distributed Tracing
6.4 Integrated Dashboards
6.5 Observability
About The Book
Read the Book
Build the Book
Contribute to the Book
About The Authors
Read The Latest!
Foreword
First the applications all moved to the cloud. And now theyre being torn apart. Let me explain what I mean by that.
As markets grow, the unit of function around which one can build a business shrinks. A classic example of this can be seen in the history of the automotive industry. The Ford River Rouge Complex was built in the late 1920s. At the time, mass-produced cars were relatively new, and the market was relatively small. And so factories like the River Rouge Complex had to build all the subcomponents too. Roughly, in one side of the factory went water, rubber, and iron ore, and out the other side came full automobiles. Of course, as the market for cars grew, so did a massive ecosystem of suppliers of car components: wheels, seats, floor mats, and the like. Today the large car companies are more akin to integrators than auto parts makers.
The same dynamic is happening with the application. In the 1970s the same manufacturer would build the chips, the circuit boards, the system form factor, the operating system, and each of the applications. Over time as the market has grown, the system has disaggregated. The hardware and software separated and spawned multiple independent companies. And then companies started to be built around independent applications.
The market hasnt stopped growing and over the last few years weve seen the application itself disaggregate. Commonly used subcomponents of applications are being pulled out, and entire companies and projects are being built around them. Today, if youre building an application, there are third-party APIs available for authenticating users, sending texts or email, streaming videos, authorizing access to resources, and many other useful functions.
So what does this have to do with the book youre about to read? While the last decade was a consolidation of applications into the cloud, the next decade is largely going to be about the explosion of applications and application components away from it. Now that subcomponents of workloads have been largely decoupled from having to sit with the application, they can be run anywhere. And in particular they can be run on infrastructure thats purposely built and optimized for them! In fact, we are starting to see what can only be described as an anti-cloud trend where large companies are choosing to pull some workloads back from large clouds to their own optimized infrastructure. And were even seeing startups choosing to build their own infrastructure from the get-go because they understand the cost and performance advantages of doing so.
In Edge Cloud Operations: A Systems Approach the authors provide a detailed overview of not just cloud operations (which are so last decade) but operations in this new era of distributed clouds. In many ways, the cloud era was a low point of systems, because so much below the application layer was buried deep within the engineering organizations of the three large cloud providers. But thats changing, and to change with it, you need to understand how it all works. And thats exactly why you need to read this book.
Martin Casado
General Partner, a16z
Preface
The cloud is ubiquitous. Everyone uses the cloud to either access or deliver services, but not everyone will build and operate a cloud. So why should anyone care about how to turn a pile of servers and switches into a 24/7 service delivery platform? Thats what Google, Microsoft, Amazon and the other cloud providers do for us, and they do a perfectly good job of it.
The answer, we believe, is that the cloud is becoming ubiquitous in another way, as distributed applications increasing run not just in large, central datacenters but at the edge. As applications are disaggregated, the cloud is expanding from hundreds of datacenters to tens of thousands of enterprises. And while it is clear that the commodity cloud providers are eager to manage those edge clouds as a logical extension of their datacenters, they do not have a monopoly on the know-how for making that happen.
This book lays out a roadmap that a small team of engineers followed over the course of a year to stand up and operationalize an edge cloud and then operate it 24/7. This edge cloud spans a dozen enterprises, and hosts a non-trivial cloud native service5G connectivity in our case, but thats just an example. The team was able to do this by leveraging 20+ open source components, but selecting those components is just a start. There were dozens of technical decisions to make along the way, and a few thousand lines of configuration code to write. We believe this is a repeatable exercise, which we report in this book. The code for those configuration files is open source, for those who want to pursue the topic in more detail.
What do we mean by an edge cloud? Were drawing a distinction between clouds run by the hyperscale cloud providers in their massive data centers, which we think of as the core, and those run by enterprises (or managed for them) at the edge. The edge is where the real, physical world meets the cloud. For example, it is the place where data from sensors is likely to be gathered and processed, and where services that need to be close to the end user for reasons of latency or bandwidth are delivered.
Our roadmap may not be the right one for all circumstances, but it does shine a light on the fundamental challenges and trade-offs involved in operationalizing a cloud. As we can attest based on our experience, its a complicated design space with an overabundance of terminology and storylines to untangle.
Intended Audience
We hope this book makes valuable reading for anyone who is trying to stand up and operationalize their own edge cloud infrastructure, but we also aim to provide useful information for at least two other broad groups.
First, there will be a set of readers who need to evaluate the options available, particularly to decide between using the cloud services offered by one of the hyperscalers or building their own edge cloud (or some combination of these). We hope to demystify the landscape of edge clouds for this audience to help inform those decisions.