If you purchased this ebook directly from oreilly.com, you have the following benefits:
If you purchased this ebook from another retailer, you can upgrade your ebook to take advantage of all these benefits for just $4.99. to access your ebook upgrade.
Preface
At a recent meeting of network admins, the talk turned to uptime, and some bragged about the high availability of services in their network; they had 100% uptime. Wow, this normally is unthinkable. After more discussion, the truth came out. This figure is based on the fact that the service provider did not take into account outages in the network that made their service unavailable, because their service was still up, though totally unreachable. The same admins also admitted that they really didnt keep records of actual outages. In their opinion, they had no reliability issues, but their customers would disagree with that.
This book is not about reliability theory. Theory addresses the full range of possibilities. This book is for those of us who have to keep the network working. It is a guide for students of the art of creating self-sustaining continuous systems.
As such, we fought to keep the book grounded not necessarily in what you can do, but more importantly, in what you should do as an administrator to protect availability and to keep the customers, internal or external, connected and happy. Most of the chapters include case studies that show you how things work and provide pointers on where you might investigate if your results differ. The topologies included are realistic and in many cases reflective of actual networks that we, the authoring team, have worked with at some point in our careers.
There are four authors on this book, and while we tried to homogenize the writing, you will see different styles and different approaches. Ultimately, we think thats a good thing. Its like working with your peers who are also maintaining the same network and who have different methods of working. The team shares a common goal and the variation in approaches brings strength through diversity.
Ultimately, this book is about Juniper Networks JUNOS Software and Juniper Networks boxes. You need to design a continuous system, and you need the right mix of equipment placed ideally on your topology, but eventually you come back to the network OS. And our chapters all come back to roost with JUNOS.
What Is High Availability?
How often in your life have you picked up a phone and not heard a dial tone? Not very often, right? Every time you did it was certainly a cause for concern. This is a classic example of the definition of availability . People do not expect the network to be in use constantly, 365 days a year, but they do expect the network to be available for use every time they try to use it. With a high number of users expecting availability as needed, we begin to approach the point of constant availability. But is that realistic? Statistically speaking, no; over a long enough timeline every system eventually fails. So, what is a realistic solution for systems whose purpose means they cant be allowed to fail?
A classic concern with high availability was the difficulty in measurement. The notion was that any measurement tool had to be more available than the system being measured. Otherwise, the tool would potentially fail before the system being measured. These days the most highly available systems are processing constant and ever-increasing volumes of user traffic, such as credit card transactions, calls connected, and web page hits. Any disruption in service would immediately be noticed and felt by end users. The users themselves have become the most effective availability monitoring tool.
Five 9s is easily dismissed as a marketing term, but the math behind the term is sound and wholly nonmarketing. The 9s concept is a measure of availability over a span of a year. It is a percentage of time during the year that the system is guaranteed to be functional. The following table is often drawn to describe the concept:
Availability | Downtime in one year |
---|
90% | 876 hours |
99% | 87.6 hours |
99.9% | 8.76 hours |
99.99% | 52.6 minutes |
99.999% | 5.26 minutes |
99.9999% | 31.5 seconds |
Note
In this book we cite five 9s as a concept rather than as the recommended target. In financial enterprises, five 9s could be unacceptable and the target may instead be seven 9s, or eight 9s. Whenever you see 9s in this book, whether your target is five, seven, or even nine 9s, please read it as a measurement of a continuous system rather than as a figurative number we recommend for all networks.
The table about 9s gets the message across, but it doesnt really tell the story of where availability should be measured. of this book talks about dependencies within redundancy schemes: redundant components protect chassis, redundant chassis protect systems, redundant systems protect services, and redundant services protect the enterprise. Some vendors would have you believe that availability should be measured at the chassis level. Others tout the availability of specific components in their chassis.
User experience is reality. This reality means that neither component nor system levels are appropriate points to measure availability. Relying on hardware availability as a measure of system, service, and enterprise availability ignores the importance of network architecture planning and site design, effective monitoring, and a highly trained and proactive support staff. In the modern world of constant transactions, it is the services and the enterprise that must be available 99.999999% of the time. This is the approach weve taken in this book.
So, are we saying that the component and chassis availability are irrelevant? Hardly. The strength and resilience of components are critical to the chassis. The availability of chassis is critical to the availability of services. The point is that even with best-in-class components and chassis it is possible to make poor design and configuration decisions. The fact that you have chosen to buy Juniper means that you have already secured best-in-class components and chassis. The purpose of this book is to help you make the most of this investment and build truly continuous systems and services.
How to Use This Book
We are assuming a certain level of knowledge from the reader. This is important. If you are not familiar with any of the assumptions in the following list, this book will occasionally veer over your head. The JUNOS documentation site is a great place to start. Its thorough, well written, and free.