R Cookbook
Paul Teetor
Copyright 2011 Paul Teetor
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (.
Nutshell Handbook, the Nutshell Handbook logo, and the OReilly logo are registered trademarks of OReilly Media, Inc. R Cookbook , the image of a harpy eagle, and related trade dress are trademarks of OReilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and OReilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

O'Reilly Media
Preface
R is a powerful tool for statistics, graphics, and statistical programming. It is used by tens of thousands of people daily to perform serious statistical analyses. It is a free, open source system whose implementation is the collective accomplishment of many intelligent, hard-working people. There are more than 2,000 available add-ons, and R is a serious rival to all commercial statistical packages.
But R can be frustrating. Its not obvious how to accomplish many tasks, even simple ones. The simple tasks are easy once you know how, yet figuring out that how can be maddening.
This book is full of how-to recipes, each of which solves a specific problem. The recipe includes a quick introduction to the solution followed by a discussion that aims to unpack the solution and give you some insight into how it works. I know these recipes are useful and I know they work, because I use them myself.
The range of recipes is broad. It starts with basic tasks before moving on to input and output, general statistics, graphics, and linear regression. Any significant work with R will involve most or all of these areas.
If you are a beginner then this book will get you started faster. If you are an intermediate user, this book is useful for expanding your horizons and jogging your memory (How do I do that KolmogorovSmirnov test again?).
The book is not a tutorial on R, although you will learn something by studying the recipes. It is not a reference manual, but it does contain a lot of useful information. It is not a book on programming in R, although many recipes are useful inside R scripts.
Finally, this book is not an introduction to statistics. Many recipes assume that you are familiar with the underlying statistical procedure, if any, and just want to know how its done in R.
The Recipes
Most recipes use one or two R functions to solve a specific problem. Its important to remember that I do not describe the functions in detail; rather, I describe just enough to solve the immediate problem. Nearly every such function has additional capabilities beyond those described here, and some have amazing capabilities. I strongly urge you to read the functions help page. You will likely learn something valuable.
Each recipe presents one way to solve a particular problem. Of course, there are likely several reasonable solutions to each problem. When I knew of multiple solutions, I generally selected the simplest one. For any given task, you can probably discover several alternative solutions yourself. This is a cookbook, not a bible.
In particular, R has literally thousands of downloadable add-on packages, many of which implement alternative algorithms and statistical methods. This book concentrates on the core functionality available through the basic distribution, so your best source of alternative solutions may be searching for an add-on package ().
A Note on Terminology
The goal of every recipe is to solve a problem and solve it quickly. Rather than laboring in tedious prose, I occasionally streamline the description with terminology that is correct but not precise. A good example is the term generic function. I refer to print(x)
and plot(x)
as generic functions because they work for many kinds of x
, handling each kind appropriately. A computer scientist would wince at my terminology because, strictly speaking, these are not simply functions; they are polymorphic methods with dynamic dispatching. But if I carefully unpacked every such technical detail, the essential solution would be buried in the technicalities. So I just call them functions, which I think is more readable.
Another example, taken from statistics, is the complexity surrounding the semantics of statistical hypothesis testing. Using the strict language of probability theory would obscure the practical application of some tests, so I use more colloquial language when describing each statistical test. See the for more about how hypothesis tests are presented in the recipes.
My goal is to make the power of R available to a wide audience by writing readably, not formally. I hope that experts in their respective fields will understand if my terminology is occasionally informal.
Software and Platform Notes
The base distribution of R has frequent and planned releases, but the language definition and core implementation are stable. The recipes in this book should work with any recent release of the base distribution.
Some recipes have platform-specific considerations, and I have carefully noted them. Those recipes mostly deal with software issues, such as installation and configuration. As far as I know, all other recipes will work on all three major platforms for R: Windows, OS X, and Linux/Unix.
Other Resources
On the Web
The mother ship for all things R is the R project site. From there you can download binaries, add-on packages, documentation, and source code as well as many other resources.
Beyond the R project site, I recommend using an R-specific search enginesuch as for more about searching the Web.
Reading blogs is a great way to learn about R and stay abreast of leading-edge developments. There are surprisingly many such blogs, so I recommend following two blog-of-blogs: R-bloggers, created by Tal Galili; and PlanetR. By subscribing to their RSS feeds, you will be notified of interesting and useful articles from dozens of websites.
R books
There are many, many books about learning and using R; listed here are a few that I have found useful. Note that the R project site contains an extensive bibliography of books related to R.
I recommend An Introduction to R, by William Venables et al. (Network Theory Limited). It covers many topics and is useful for beginners. You can download the PDF for free from CRAN; or, better yet, buy the printed copy because the profits are donated to the R project.
R in a Nutshell, by Joseph Adler (OReilly), is the quick tutorial and reference youll keep by your side. It covers many more topics than this Cookbook .
Anyone doing serious graphics work in R will want R Graphics by Paul Murrell (Chapman & Hall/CRC). Depending on which graphics package you use, you may also want Lattice: Multivariate Data Visualization with R by Deepayan Sarkar (Springer) and ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham (Springer).
Modern Applied Statistics with S (4th ed.), by William Venables and Brian Ripley (Springer), uses R to illustrate many advanced statistical techniques. The books functions and datasets are available in the