Supplemental files and examples for this book can be found at http://examples.oreilly.com/9780596801717/. Please use a standard desktop web browser to access these files, as they may not be accessible from all ereader devices.
All code files or examples referenced in the book will be available online. For physical books that ship with an accompanying disc, whenever possible, weve posted all CD/DVD content. Note that while we provide as much of the media content as we are able via free download, we are sometimes limited by licensing restrictions. Please direct any questions or concerns to .
Preface
Its been 10 years since I was first introduced to R. Back then, I was a young product development manager at DoubleClick, a company that sells advertising software for managing online ad sales. I was working on inventory prediction: estimating the number of ad impressions that could be sold for a given search term, web page, or demographic characteristic. I wanted to play with the data myself, but we couldnt afford a piece of expensive software like SAS or MATLAB. I looked around for a little while, trying to find an open source statistics package, and stumbled on R. Back then, R was a bit rough around the edges, and was missing a lot of the features it has today (like fancy graphics and statistics functions). But R was intuitive and easy to use; I was hooked. Since that time, Ive used R to do many different things: estimate credit risk, analyze baseball statistics, and look for Internet security threats. Ive learned a lot about data, and matured a lot as a data analyst.
R, too, has matured a great deal over the past 10 years. R is used at the worlds largest technology companies (including Google, Microsoft, and Facebook), the largest pharmaceutical companies (including Johnson & Johnson, Merck, and Pfizer), and at hundreds of other companies. Its used in statistics classes at universities around the world and by statistics researchers to try new techniques and algorithms.
Why I Wrote This Book
This book is designed to be a concise guide to R. Its not intended to be a book about statistics or an exhaustive guide to R. In this book, I tried to show all the things that R can do and to give examples showing how to do them. This book is designed to be a good desktop reference.
I wrote this book because I like R. R is fun and intuitive in ways that other solutions are not. You can do things in a few lines of R that could take hours of struggling in a spreadsheet. Similarly, you can do things in a few lines of R that could take pages of Java code (and hours of Java coding). There are some excellent books on R, but I couldnt find an inexpensive book that gave an overview of everything you could do in R. I hope this book helps you use R.
When Should You Use R?
I think R is a great piece of software, but it isnt the right tool for every problem. Clearly, it would be ridiculous to write a video game in R, but its not even the best tool for all data problems.
R is very good at plotting graphics, analyzing data, and fitting statistical models using data that fits in the computers memory. Its not as good at storing data in complicated structures, efficiently querying data, or working with data that doesnt fit in the computers memory.
Typically, I use a tool like Perl to preprocess large files before using them in R. Its technically possible to use R for these problems (by reading files one line at a time and using Rs regular expression support), but its pretty awkward. To hold large data files, I usually use a database like MySQL, PostgreSQL, SQLite, or Oracle (when someone else is paying the license fee).
R License Terms
] This means that you can install R for free on most desktop and server machines. (Comparable commercial software packages sell for hundreds or thousands of dollars.) If R were a poor substitute for the commercial software packages, this might have limited appeal. However, I think R is better than its commercial counterparts in many respects.
Capability
You can find implementations for hundreds (maybe thousands) of statistical and data analysis algorithms in R. No commercial package offers anywhere near the scope of functionality available through the Comprehensive R Archive Network (CRAN).
Community
There are now hundreds of thousands (if not millions) of R users worldwide. By using R, you can be sure that youre using the same software that your colleagues are using.
Performance
Rs performance is comparable, or superior, to most commercial analysis packages. R requires you to load data sets into memory before processing. If you have enough memory to hold the data, R can run very quickly. Luckily, memory is cheap. You can buy 32 GB of server RAM for less than the cost of a single desktop license of a comparable piece of commercial statistical software.
[] There is some controversy about GPL licensed software, and what it means to you as a corporate user. Some users are afraid that any code that they write in R will be bound by the GPL. If you are not writing extensions to R, you do not need to worry about this issue. R is an interpreter, and the GPL does not apply to a program just because it is executed on a GPL licensed interpreter.
If you are writing extensions to R, they might be bound by the GPL. For more information, see the GNU foundations FAQ on the GPL: http://www.gnu.org/licenses/gplfaq. However, for a definite answer, see an attorney. If you are worried about a specific application, see an attorney.
Examples
I have tried to provide many unique examples in this book, illustrating how to use different functions in R. I deliberately decided to use new and original examples, and not to rely on the data sets included with R. When Im trying to solve a problem, I try to find examples of similar solutions. There are already good examples for many functions in the R help files. I tried to provide new examples to help users figure out how to solve their problems quickly. The examples are available from OReilly Media at http://oreilly.com/catalog/9780596801700.
Additionally, the example data is also available through CRAN as an R package. To install the nutshell
package, type the following command on the R console:
install.packages("nutshell")
How This Book Is Organized
Ive broken this book into five parts: