If you purchased this ebook directly from oreilly.com, you have the following benefits:
If you purchased this ebook from another retailer, you can upgrade your ebook to take advantage of all these benefits for just $4.99. to access your ebook upgrade.
Graphs Are Everywhere, or the Birth of Graph Databases as We Know Them
It was 1999 and everyone worked 23-hour days. At least it felt that way. It seemed like each day brought another story about a crazy idea that just got millions of dollars in funding. All our competitors had hundreds of engineers, and we were a 20-ish person development team. As if that was not enough, 10 of our engineers spent the majority of their time just fighting the relational database.
It took us a while to figure out why. As we drilled deeper into the persistence layer of our enterprise content management application, we realized that our software was managing not just a lot of individual, isolated, and discrete data items, but also the connections between them. And while we could easily fit the discrete data in relational tables, the connected data was more challenging to store and tremendously slow to query.
Out of pure desperation, my two Neo cofounders, Johan and Peter, and I started experimenting with other models for working with data, particularly those that were centered around graphs. We were blown away by the idea that it might be possible to replace the tabular SQL semantic with a graph-centric model that would be much easier for developers to work with when navigating connected data. We sensed that, armed with a graph data model, our development team might not waste half its time fighting the database.
Surely, we said to ourselves, we cant be unique here. Graph theory has been around for nearly 300 years and is well known for its wide applicability across a number of diverse mathematical problems. Surely, there must be databases out there that embrace graphs!
Well, we Altavistad[] around the young Web and couldnt find any. After a few months of surveying, we (naively) set out to build, from scratch, a database that worked natively with graphs. Our vision was to keep all the proven features from the relational database (transactions, ACID, triggers, etc.) but use a data model for the 21st century. Project Neo was born, and with it graph databases as we know them today.
The first decade of the new millennium has seen several world-changing new businesses spring to life, including Google, Facebook, and Twitter. And there is a common thread among them: they put connected datagraphsat the center of their business. Its 15 years later and graphs are everywhere.
Facebook, for example, was founded on the idea that while theres value in discrete information about peopletheir names, what they do, etc.theres even more value in the relationships between them. Facebook founder Mark Zuckerberg built an empire on the insight to capture these relationships in the social graph .
Similarly, Googles Larry Page and Sergey Brin figured out how to store and process not just discrete web documents, but how those web documents are connected. Google captured the web graph , and it made them arguably the most impactful company of the previous decade.
Today, graphs have been successfully adopted outside the web giants. One of the biggest logistics companies in the world uses a graph database in real time to route physical parcels; a major airline is leveraging graphs for its media content metadata; and a top-tier financial services firm has rewritten its entire entitlements infrastructure on Neo4j. Virtually unknown a few years ago, graph databases are now used in industries as diverse as healthcare, retail, oil and gas, media, gaming, and beyond, with every indication of accelerating their already explosive pace.
These ideas deserve a new breed of tools: general-purpose database management technologies that embrace connected data and enable graph thinking, which are the kind of tools I wish had been available off the shelf when we were fighting the relational database back in 1999.
I hope this book will serve as a great introduction to this wonderful emerging world of graph technologies, and I hope it will inspire you to start using a graph database in your next project so that you too can unlock the extraordinary power of graphs. Good luck!
[] For the younger readers, it may come as a shock that there was a time in the history of mankind when Google didnt exist. Back then, dinosaurs ruled the earth and search engines with names like Altavista, Lycos, and Excite were used, primarily to find ecommerce portals for pet food on the Internet.
Preface
Graph databases address one of the great macroscopic business trends of today: leveraging complex and dynamic relationships in highly connected data to generate insight and competitive advantage. Whether we want to understand relationships between customers, elements in a telephone or data center network, entertainment producers and consumers, or genes and proteins, the ability to understand and analyze vast graphs of highly connected data will be key in determining which companies outperform their competitors over the coming decade.
For data of any significant size or value, graph databases are the best way to represent and query connected data. Connected data is data whose interpretation and value requires us first to understand the ways in which its constituent elements are related. More often than not, to generate this understanding, we need to name and qualify the connections between things.
Although large corporates realized this some time ago and began creating their own proprietary graph processing technologies, were now in an era where that technology has rapidly become democratized. Today, general-purpose graph databases are a reality, enabling mainstream users to experience the benefits of connected data without having to invest in building their own graph infrastructure.
Whats remarkable about this renaissance of graph data and graph thinking is that graph theory itself is not new. Graph theory was pioneered by Euler in the 18th century, and has been actively researched and improved by mathematicians, sociologists, anthropologists, and others ever since. However, it is only in the past few years that graph theory and graph thinking have been applied to information management. In that time, graph databases have helped solve important problems in the areas of social networking, master data management, geospatial, recommendations, and more. This increased focus on graph databases is driven by twin forces: by the massive commercial success of companies such as Facebook, Google, and Twitter, all of whom have centered their business models around their own proprietary graph technologies; and by the introduction of general-purpose graph databases into the technology landscape.