R Cookbook
by J.D. Long and Paul Teetor
Copyright 2019 J.D. Long and Paul Teetor. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Editors: Nicole Tache and Melissa Potter
- Production Editor: Kristen Brown
- Copyeditor: Rachel Monaghan
- Proofreader: Rachel Head
- Indexer: Ellen Troutman-Zaig
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Rebecca Demarest
- March 2011: First Edition
- July 2019: Second Edition
Revision History for the Second Edition
- 2019-06-21: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492040682 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. R Cookbook, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-492-04068-2
[LSI]
Welcome to the R Cookbook, 2nd Edition
R is a powerful tool for statistics, graphics, and statisticalprogramming. It is used by tens of thousands of people daily to performserious statistical analyses. It is a free, open source system whoseimplementation is the collective accomplishment of many intelligent,hard-working people. There are more than 10,000 available add-onpackages, and R is a serious rival to all commercial statisticalpackages.
But R can be frustrating. Its not obvious how to accomplish many tasks,even simple ones. The simple tasks are easy once you know how, yetfiguring out that how can be maddening.
This book is full of how-to recipes, each of which solves a specificproblem. Each recipe includes a quick introduction to the solutionfollowed by a discussion that aims to unpack the solution and give yousome insight into how it works. We know these recipes are useful and weknow they work, because we use them ourselves.
The range of recipes is broad. It starts with basic tasks before movingon to input and output, general statistics, graphics, and linearregression. Any significant work with R will involve most or all ofthese areas.
If you are a beginner, then this book will get you started faster. Ifyou are an intermediate user, this book will be useful for expanding yourhorizons and jogging your memory (How do I do that KolmogorovSmirnovtest again?).
The book is not a tutorial on R, although you will learn something bystudying the recipes. It is not a reference manual, but it does containa lot of useful information. It is not a book on programming in R,although many recipes are useful inside R scripts.
Finally, this book is not an introduction to statistics. Many recipesassume that you are familiar with the underlying statistical procedure,if any, and just want to know how its done in R.
The Recipes
Most recipes use one or two R functions to solve a specific problem.Its important to remember that we do not describe the functions indetail; rather, we describe just enough to solve the immediate problem.Nearly every such function has additional capabilities beyond thosedescribed here, and some have amazing capabilities. We strongly urge youto read the functions help pages. You will likely learn somethingvaluable.
Each recipe presents one way to solve a particular problem. Of course,there are likely several reasonable solutions to each problem. When weknew of multiple solutions, we generally selected the simplest one. Forany given task, you can probably discover several alternative solutionsyourself. This is a cookbook, not a bible.
In particular, R has literally thousands of downloadable add-onpackages, many of which implement alternative algorithms and statisticalmethods. This book concentrates on the core functionality availablethrough the basic distribution combined with several important packagesknown collectively as the tidyverse.
The most concise definition of the tidyverse comes from Hadley Wickham, its originator and one of its core maintainers:
The tidyverse is a set of packages that work in harmony because theyshare common data representations and API design. The tidyverse
packageis designed to make it easy to install and load core packages from thetidyverse in a single command. The best place to learn about all thepackages in the tidyverse and how they fit together isR for Data Science.
A Note on Terminology
The goal of every recipe is to solve a problem and solve it quickly.Rather than laboring in tedious prose, we occasionally streamline thedescription with terminology that is correct but not precise. A goodexample is the term generic function. We refer to print(x)
andplot(x)
as generic functions because they work for many kinds of x
,handling each kind appropriately. A computer scientist would wince atour terminology because, strictly speaking, these are not simplyfunctions; they are polymorphic methods with dynamic dispatching. Butif we carefully unpacked every such technical detail, the essentialsolutions would be buried in the technicalities. So we just call themfunctions, which we think is more readable.
Another example, taken from statistics, is the complexity surroundingthe semantics of statistical hypothesis testing. Using the strictlanguage of probability theory would obscure the practical applicationof some tests, so we use more colloquial language when describing eachstatistical test. See the introduction to for more about how hypothesis tests arepresented in the recipes.
Our goal is to make the power of R available to a wide audience bywriting readably, not formally. We hope that experts in their respectivefields will understand if our terminology is occasionally informal.
Software and Platform Notes
The base distribution of R has frequent and planned releases, but thelanguage definition and core implementation are stable. The recipes inthis book should work with any recent release of the base distribution.
Some recipes have platform-specific considerations, and we havecarefully noted them. Those recipes mostly deal with software issues,such as installation and configuration. As far as we know, all otherrecipes will work on all three major platforms for R: Windows, macOS,and Linux/Unix.
Other Resources
Here are a few suggestions for further reading, if oyud like to dig a little deeper:
On the web
The mother ship for all things R is the Rproject site. From there you can download R for your platform, add-onpackages, documentation, and source code as well as many otherresources.