This book is dedicated to my sweetheart, Alison Brown.
I can hear the sound of violins, long before it begins.
E.H.
J.C.
Foreword
Cassandra was open-sourced by Facebook in July 2008. This original version of Cassandra was written primarily by an ex-employee from Amazon and one from Microsoft. It was strongly influenced by Dynamo, Amazons pioneering distributed key/value database. Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful column family data model.
I became involved in December of that year, when Rackspace asked me to build them a scalable database. This was good timing, because all of todays important open source scalable databases were available for evaluation. Despite initially having only a single major use case, Cassandras underlying architecture was the strongest, and I directed my efforts toward improving the code and building a community.
Cassandra was accepted into the Apache Incubator, and by the time it graduated in March 2010, it had become a true open source success story, with committers from Rackspace, Digg, Twitter, and other companies that wouldnt have written their own database from scratch, but together built something important.
Todays Cassandra is much more than the early system that powered (and still powers) Facebooks inbox search; it has become the hands-down winner for transaction processing performance, to quote Tony Bain, with a deserved reputation for reliability and performance at scale.
As Cassandra matured and began attracting more mainstream users, it became clear that there was a need for commercial support; thus, Matt Pfeil and I cofounded Riptano in April 2010. Helping drive Cassandra adoption has been very rewarding, especially seeing the uses that dont get discussed in public.
Another need has been a book like this one. Like many open source projects, Cassandras documentation has historically been weak. And even when the documentation ultimately improves, a book-length treatment like this will remain useful.
Thanks to Eben for tackling the difficult task of distilling the art and science of developing against and deploying Cassandra. You, the reader, have the opportunity to learn these new concepts in an organized fashion.
Jonathan Ellis
Project Chair, Apache Cassandra, and Cofounder and CTO, DataStax
Foreword
I am so excited to be writing the foreword for the new edition of Cassandra: The Definitive Guide. Why? Because there is a new edition! When the original version of this book was written, Apache Cassandra was a brand new project. Over the years, so much has changed that users from that time would barely recognize the database today. Its notoriously hard to keep track of fast moving projects like Apache Cassandra, and Im very thankful to Jeff for taking on this task and communicating the latest to the world.
One of the most important updates to the new edition is the content on modeling your data. I have said this many times in public: a data model can be the difference between a successful Apache Cassandra project and a failed one. A good portion of this book is now devoted to understanding how to do it right. Operations folks, you havent been left out either. Modern Apache Cassandra includes things such as virtual nodes and many new options to maintain data consistency, which are all explained in the second edition. Theres so much ground to cover its a good thing you got the definitive guide!
Whatever your focus, you have made a great choice in learning more about Apache Cassandra. There is no better time to add this skill to your toolbox. Or, for experienced users, maintaining your knowledge by keeping current with changes will give you an edge. As recent surveys have shown, Apache Cassandra skills are some of the highest paying and most sought after in the world of application development and infrastructure. This also shows a very clear trend in our industry. When organizations need a highly scaling, always-on, multi-datacenter database, you cant find a better choice than Apache Cassandra. A quick search will yield hundreds of companies that have staked their success on our favorite database. This trust is well founded, as you will see as you read on. As applications are moving to the cloud by default, Cassandra keeps up with dynamic and global data needs. This book will teach you why and how to apply it in your application. Build something amazing and be yet another success story.
And finally, I invite you to join our thriving Apache Cassandra community. Worldwide, the community has been one of the strongest non-technical assets for new users. We are lucky to have a thriving Cassandra community, and collaboration among our members has made Apache Cassandra a stronger database. There are many ways you can participate. You can start with simple things like attending meetups or conferences, where you can network with your peers. Eventually you may want to make more involved contributions like writing blog posts or giving presentations, which can add to the group intelligence and help new users following behind you. And, the most critical part of an open source project, make technical contributions. Write some code to fix a bug or add a feature. Submit a bug report or feature request in a JIRA. These contributions are a great measurement of the health and vibrancy of a project. You dont need any special status, just create an account and go! And when you need help, refer back to this book, or reach out to our community. We are here to help you be successful.