Preface
When I show visualization projects to an audience, one of the most common questions is, "How do you do this?" Other books about data visualization do exist, but the most prominent ones are often collections of academic papers; in any case, few explain how to actually build representations. Books from the field of design that offer advice for creating visualizations see the field only in terms of static displays, ignoring the possibility of dynamic, software-based visualizations. A number spend most of their time dissecting what's wrong with given representationssometimes providing solutions, but more often not.
In this book, I wanted to offer something for people who want to get started building their own visualizations, something to use as a jumping-off point for more complicated work. I don't cover everything, but I've tried to provide enough background so that you'll know where to go next.
I wrote this book because I wanted to have a way to make the ideas from Computational Information Design , my Ph.D. dissertation, more accessible to a wider audience. More specifically, I wanted to see these ideas actually applied, rather than limited to an academic document on a shelf. My dissertation covered the process of getting from data to understanding; in other words, from considering a pile of information to presenting it usefully, in a way that can be easily understood and interacted with. This process is covered in , and used throughout the book as a framework for working through visualizations.
Most of the examples in this book are written from scratch. Rather than relying on toolkits or libraries that produce charts or graphs, instead you learn how to create them using a little math, some lines and rectangles, and bits of text. Many readers may have tried some toolkits and found them lacking, particularly because they want to customize the display of their information. A tool that has generic uses will produce only generic displays, which can be disappointing if the displays do not suit your data set. Data can take many interesting forms that require unique types of display and interaction; this book aims to open up your imagination in ways that collections of bar and pie charts cannot.
This book uses Processing (http://processing.org), a simple programming environment and API that I co-developed with Casey Reas of UCLA. Processing's programming environment makes it easy to sit down and "sketch" code to produce visual images quickly. Once you outgrow the environment, it's possible to use a regular Java IDE to write Processing code because the API is based on Java. Processing is free to download and open source. It has been in development since 2001, and we've had about 100,000 people try it out in the last 12 months. Today Processing is used by tens of thousands of people for all manners of work. When I began writing this book, I debated which language and API to use. It could have been based on Java, but I realized I would have found myself re-implementing the Processing API to make things simple. It could have been based on Actionscript and Flash, but Flash is expensive to buy and tends to break down when dealing with larger data sets. Other scripting languages such as Python and Ruby are useful, but their execution speeds don't keep up with Java. In the end, Processing was the right combination of cost, ease of use, and execution speed.
The Audience for This Book
In the spring of 2007, I co-taught an Information Visualization course at Carnegie Mellon. Our 30 students ranged from a freshman in the art school to a Ph.D. candidate in computer science. In between were graduate students from the School of Design and various other undergrads. Their skill levels were enormously varied, but that was less important than their level of curiosity, and students who were curious and willing to put in some work managed to overcome the technical difficulties (for the art and design students) or the visual demands (for those with an engineering background).
This book is targeted at a similar range of backgrounds, if less academic. I'm trying to address people who want to ask questions, play with data, and gain an understanding of how to communicate information to others. For instance, the book is for web designers who want to build more complex visualizations than their tools will allow. It's also for software engineers who want to become adept at writing software that represents datathat calls on them to try out new skills, even if they have some background in building UIs. None of this is rocket science, but it isn't always obvious how to get started.
Fundamentally, this book is for people who have a data set, a curiosity to explore it, and an idea of what they want to communicate about it. The set of people who visualize data is growing extremely quickly as we deal with more and more information. Even more important, the audience has moved far beyond those who are experts in visualization. By making these ideas accessible to a wide range of people, we should see some truly amazing things in the next decade.
Background Information
Because the audience for this book includes both programmers and nonprogrammers, the material varies in complexity. Beginners should be able to pick it up and get through the first few chapters, but they may find themselves lost as we get into more complicated programming topics. If you're looking for a gentler introduction to programming with Processing, other books are available (including one written by Casey Reas and me) that are more suited to learning the concepts from scratch, though they don't cover the specifics of visualizing data. can be understood by someone without any programming background, but the later chapters quickly become more difficult.
You'll be most successful with this book if you have some familiarity with writing codewhether it's Java, C++, or Actionscript. This is not an advanced text by any means, but a little background in writing code will go a long way toward understanding the concepts.
Overview of the Book
, covers the process for developing a useful visualization, from acquiring data to interacting with it. This is the framework we'll use as we attack problems in later chapters.
, is a basic introduction to the Processing environment and syntax. It provides a bit of background on the structure of the API and the philosophy behind the project's development.
cover example projects that get progressively more complicated.
, plots data points on a map, our first introduction to reading data from the disk and representing it on the screen.
, covers several methods of plotting charts that represent how data changes over time.
web site and produces an image correlating player salaries and team performance over the course of a baseball season. It's an in-depth example illustrating how to scrape data from a web site that lacks an official API. These techniques can be applied to many other projects, even if you're not interested in baseball.
, answers the question, "How do zip codes relate to geography?" by developing a project that allows users to progressively refine a U.S. map as they type a zip code.
, discusses trees and hierarchies. It covers recursion, an important topic when dealing with tree structures, and treemaps, a useful representation for certain kinds of tree data.