Chapter 1. Connections Are Everything
A Note for Early Release Readers
With Early Release ebooks, you get books in their earliest formthe authors raw and unedited content as they writeso you can take advantage of these technologies long before the official release of these titles.
This will be the 1st chapter of the final book.
If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at .
In an extreme view, the world can be seen as only connections, nothing else. We think of a dictionary as the repository of meaning, but it defines words only in terms of other words. I liked the idea that a piece of information is really defined only by what its related to, and how its related. There really is little else to meaning. The structure is everything.
Tim Berners-Lee, in Weaving TheWeb :The Original Design and Ultimate Destiny of the World Wide Web (1999), p. 14
The twentieth century demonstrated how much we could achieve with spreadsheets and relational databases. Tabular data ruled. The twenty-first century has already shown us that that isnt enough. Tables flatten our perspective, showing connections in only two dimensions. In the real world, things are related to and connected to a myriad of other things, and those relationships shape what is and what will happen. To gain full understanding, we need to model these connections.
Personal computers were introduced in the 1970s, but they didnt take off until they found their first killer apps: financial spreadsheets. VisiCalc on the Apple II and then Lotus 1-2-3 on the IBM PC automated the laborious and error-prone calculations that bookkeepers had been doing by hand ever since the invention of writing and arithmetic: adding up rows and columns of figures, and then perhaps performing even more complex statistical calculations.
In 1970, E.F. Codd published his seminal paper on the relational database model. In these early days of databases, a few models were bouncing around, including the network database model. Codds relational model was built on something that everyone could identify with and was easy to program: the table.
Moreover, matrix algebra and many statistical methods are also ready-made to work with tables. Both physicists and business analysts used matrices to define and find the optimal solutions to everything from nuclear reactor design to supply chain management. Tables lend themselves to parallel processing; just partition the workload vertically or horizontally. Spreadsheets, relational databases, and matrix algebra: the tabular approach seemed to be the solution to everything.
Then the World Wide Web happened and everything changed.
Connections Change Everything
The web is more than the internet. The internet began in the early 1970s as a data connection network between selected US research institutions. The World Wide Web, invented by CERN researcher Tim Berners-Lee in 1989, is a set of technologies running on top of the internet that make it much easier to publish, access, and connect data in a format easy for humans to consume and interact with. Browsers, hyperlinks, and web addresses are also hallmarks of the web. At the same time that the web was being developed, governments were loosening their controls on the internet and allowing private companies to expand it. We now have billions of interconnected web pages, connecting people, multimedia, facts, and opinions, at a truly global scale. Having data isnt enough. How the data is structured matters.
What Is A Graph?
As the word web started to take on new connotations, so did the word graph. For most people, graph was synonymous with a line chart that could show something such as a stocks price over time. Mathematicians had another meaning for the word, however, and as networks and connections started to matter to the business world, the mathematical meaning started to come to the fore.
A graph is an abstract data structure consisting of vertices (or nodes) and connections between vertices called edges. Thats it. A graph is the idea of a network, constructed from these two types of elements. This abstraction allows us to study networks (or graphs) in general, to discover properties and to devise algorithms to solve general tasks. Graph theory and graph analytics provided organizations with the tools they needed to leverage the sudden abundance of connected data.
Figure 1-1. A graph showing some key players and connections in early Star Wars films.
Why Graphs Matter
The web showed us that sometimes we accomplish more by having varied data that is linked together, rather than to try to merge it all into a few rigid tables. It also showed us that connections themselves are a form of information. We have a limitless number of types of relationships: parent child, purchaser product, friend friend, and so on. As Berners-Lee observed, we get meaning from connections. When we know someone is a parent, we can infer that they have had certain life experiences and have certain concerns. We can also guess at how the parent and child will interact relative to one another.