Chapter 1. Beyond Relational Databases
Whats Wrong with Relational Databases?
I f I had asked people what they wanted, they would have said faster horses.
Henry Ford
I ask you to consider a certain model for data, invented by a small team at a company with thousands of employees. It was accessible over a TCP/IP interface and was available from a variety of languages, including Java and web services. This model was difficult at first for all but the most advanced computer scientists to understand, until broader adoption helped make the concepts clearer. Using the database built around this model required learning new terms and thinking about data storage in a different way. But as products sprang up around it, more businesses and government agencies put it to use, in no small part because it was fastcapable of processing thousands of operations a second. The revenue it generated was tremendous.
And then a new model came along.
The new model was threatening, chiefly for two reasons. First, the new model was very different from the old model, which it pointedly controverted. It was threatening because it can be hard to understand something different and new. Ensuing debates can help entrench people stubbornly further in their viewsviews that might have been largely inherited from the climate in which they learned their craft and the circumstances in which they work. Second, and perhaps more importantly, as a barrier, the new model was threatening because businesses had made considerable investments in the old model and were making lots of money with it. Changing course seemed ridiculous, even impossible.
Of course Im talking about the Information Management System (IMS) hierarchical database, invented in 1966 at IBM.
IMS was built for use in the Saturn V moon rocket. Its architect was Vern Watts, who dedicated his career to it. Many of us are familiar with IBMs database DB2. IBMs wildly popular DB2 database gets its name as the successor to DB1the product built around the hierarchical data model IMS. IMS was released in 1968, and subsequently enjoyed success in Customer Information Control System (CICS) and other applications. It is still used today.
But in the years following the invention of IMS, the new model, the disruptive model, the threatening model, was the relational database.
In his 1970 paper A Relational Model of Data for Large Shared Data Banks, Dr. Edgar F. Codd, also at IBM, advanced his theory of the relational model for data while working at IBMs San Jose research laboratory. This paper, still available at http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf, became the foundational work for relational database management systems.
Codds work was antithetical to the hierarchical structure of IMS. Understanding and working with a relational database required learning new terms that must have sounded very strange indeed to users of IMS such as relations, tuples, and normal form. It presented certain key advantages over its predecessor, such as the ability to express complex relationships between multiple entities, well beyond what could be represented by hierarchical databases..
While these ideas and their application have evolved in four decades, the relational database still is clearly one of the most successful software applications in history. Its used in the form of Microsoft Access in sole proprietorships, and in giant multinational corporations with clusters of hundreds of finely tuned instances representing multi-terabyte data warehouses. Relational databases store invoices, customer records, product catalogues, accounting ledgers, user authentication schemesthe very world, it might appear. There is no question that the relational database is a key facet of the modern technology and business landscape, and one that will be with us in its various forms for many years to come, as will IMS in its various forms. The relational model presented an alternative to IMS, and each has its uses.
So the short answer to the question, Whats wrong with relational databases? is Nothing.
There is, however, a rather longer answer, which says that every once in a while an idea is born that ostensibly changes things, and engenders a revolution of sorts. And yet, in another way, such revolutions, viewed structurally, are simply historys business as usual. IMS, RDBMS, NoSQL. The horse, the car, the plane. They each build on prior art, they each attempt to solve certain problems, and so theyre each good at certain thingsand less good at others. They each coexist, even now.
So lets examine for a moment why, at this point, we might consider an alternative to the relational database, just as Codd himself four decades ago looked at the Information Management System and thought that maybe it wasnt the only legitimate way of organizing information and solving data problems, and that maybe, for certain problems, it might prove fruitful to consider an alternative.
We encounter scalability problems when our relational applications become successful and usage goes up. Joins are inherent in any relatively normalized relational database of even modest size, and joins can be slow. The way that databases gain consistency is typically through the use of transactions, which require locking some portion of the database so its not available to other clients. This can become untenable under very heavy loads, as the locks mean that competing users start queuing up, waiting for their turn to read or write the data.