MySQL High Availability
Charles Bell
Mats Kindahl
Lars Thalmann
Foreword for the Second Edition
Yashwanth Nelapati
Pinterest
In 2011, Pinterest started growing. Some say we grew faster than any other startup to date. In the earliest days, we were up against a new scalability bottleneck every day that could slow down the site or bring it down altogether. We remember having our laptops with us everywhere. We slept with them, we ate with them, we went on vacation with them. We even named them. We have the sound of the SMS outage alerts imprinted in our brains.
When the infrastructure is constantly being pushed to its limits, you cant help but wish for an easy way out. During our growth, we tried no less than five well-known database technologies that claimed to solve all our problems, but each failed catastrophically. Except MySQL. The time came around September 2011 to throw all the cards in the air and let them resettle. We re-architected everything around MySQL, Memcache, and Redis with just three engineers.
MySQL? Why MySQL? We laid out our biggest concerns with any technology and started asking the same questions for each. Heres how MySQL shaped up:
Does it address our storage needs? Yes, we needed mappings, indexes, sorting, and blob storage, all available in MySQL.
Is it commonly used? Can you hire somebody for it? MySQL is one of the most common database choices in production today. Its so easy to hire people who have used MySQL that we could walk outside in Palo Alto and yell out for a MySQL engineer and a few would come up. Not kidding.
Is the community active? Very active. There are great books available and a strong online community.
How robust is it to failure? Very robust! Weve never lost any data even in the most dire of situations.
How well does it scale? By itself, it does not scale beyond a single box. Wed need a sharding solution layered on top. (Thats a whole other discussion!)
Will you be the biggest user? Nope, not by far. Bigger users included Facebook, Twitter, and Google. You dont want to be the biggest user of a technology if you can help it. If you are, youll trip over new scalability problems that nobody has had a chance to debug yet.
How mature is it? Maturity became the real differentiator. Maturity to us is a measure of the blood, sweat, and tears that have gone into a program divided by its complexity. MySQL is reasonably complex, but not nearly so compared to some of the magic autoclustering NoSQL solutions available. Additionally, MySQL has had 28 years of the best and the brightest contributing back to it from such companies as Facebook and Google, who use it at massive scale. Of all the technologies we looked at, by our definition of maturity, MySQL was a clear choice.
Does it have good debugging tools? As a product matures, you naturally get great debugging and profiling tools since people are more likely to have been in a similar sticky situation. Youll find yourself in trouble at 3 A.M. (multiple times). Being able to root cause an issue and get back to bed is better than rewriting for another technology by 6 A.M.
Based on our survey of 10 or so database technologies, MySQL was the clear choice. MySQL is great, but it kinda drops you off at your destination with no baggage and you have to fend for yourself. It works very well and you can connect to it, but as soon as you start using it and scaling, the questions starting flying:
My query is running slow, now what?
Should I enable compression? How do I do it?
What are ways of scaling beyond one box?
How do I get replication working? How about master-master replication?
REPLICATION STOPPED! NOW WHAT?!
What are options for durability (fsync speeds)?
How big should my buffers be?
There are a billion fields in mysql.ini. What are they? What should they be set to?
I just accidentally wrote to my slave! How do I prevent that from happening again?
How do I prevent running an UPDATE
with no where
clause?
What debugging and profiling tools should I be using?
Should I use InnoDB, MyISAM, or one of several other flavors of storage engine?
The online community is helpful for answering specific questions, finding examples, bug fixes, and workarounds, but often lacks a strong cohesive story, and deeper discussions about architecture are few and far between. We knew how to use MySQL at small scale, but this scale and pace were insane. High Availability MySQL provided insights that allowed us to squeeze more out of MySQL.
One new feature in MySQL 5.6, Global Transaction Handlers, adds a unique identifier to every transaction in a replication tree. This new feature makes failover and slave promotion far easier. Weve been waiting for this for a long time and its well covered in this new edition.
During our grand re-architecture to a sharded solution, we referred to this book for architectural decisions, such as replication techniques and topologies, data sharding alternatives, monitoring options, tuning, and concerns in the cloud. It gave us a deeper understanding of how MySQL works underneath the hood, which allowed us to make better informed choices around the high level queries, access patterns, and structures wed be using, as well as iterate on our design afterward. The resulting MySQL architecture still serves Pinterests core data needs today.
Foreword for the First Edition
Mark Callaghan
A lot of research has been done on replication, but most of the resulting concepts are never put into production. In contrast, MySQL replication is widely deployed but has never been adequately explained. This book changes that. Things are explained here that were previously limited to people willing to read a lot of source code and spend a lot of timeincluding a few late-night sessionsdebugging it in production.
Replication enables you to provide highly available data services while enduring the inevitable failures. There are an amazing number of ways for things to fail, including the loss of a disk, server, or data center. Even when hardware is perfect or fully redundant, people are not. Database tables will be dropped by mistake. Applications will write incorrect data. Occasional failure is assured. But with reasonable preparation, recovery from failure can also be assured. The keys to survival are redundancy and backups. Replication in MySQL supports both.
But MySQL replication is not limited to supporting failure recovery. It is frequently used to support read scale-out. MySQL can efficiently replicate to a large number of servers. For applications that are read-mostly, this is a cost-effective strategy for supporting a large number of queries on commodity hardware.
And there are other interesting uses for MySQL replication. Online data definition language (DDL) is a very complex feature to implement in a relational database management system. MySQL does not support online DDL, but through the use of replication, you can implement something that is frequently good enough. You can get a lot done with replication if you are willing to be creative.
Replication is one of the features that made MySQL wildly popular. It is also the feature that allows you to convert a popular MySQL prototype into a successful business-critical deployment. Like most of MySQL, replication favors simplicity and ease of use. As a consequence, it is occasionally less than perfect when running in production. This book explains what you need to know to successfully use MySQL replication. It will help you to understand how replication has been implemented, what can go wrong, how to prevent problems, and how to fix them whendespite your best attempts at preventionthey crop up.