Acknowledgments
This book has benefited greatly from the input received from many sources.
First and foremost, I must thank the technical reviewer, Hadley Wickham, of ggplot2
and plyr
fame. I suggested Hadley to No Starch Press because of his experience developing these and other highly popular R packages in CRAN, the R user-contributed code repository. As expected, a number of Hadleys comments resulted in improvements to the text, especially his comments about particular coding examples, which often began I wonder what would happen if you wrote it this way.... In some cases, these comments led to changing an example with one or two versions of code to an example showing two, three, or sometimes even four different ways to accomplish a given coding goal. This allowed for comparisons of the advantages and disadvantages of various approaches, which I believe the reader will find instructive.
I am very grateful to Jim Porzak, cofounder of the Bay Area useR Group (BARUG, http://www.bay-r.org/), for his frequent encouragement as I was writing this book. And while on the subject of BARUG, I must thank Jim and the other cofounder, Mike Driscoll, for establishing that lively and stimulating forum. At BARUG, the speakers on wonderful applications of R have always left me feeling that writing this book was a very worthy project. BARUG has also benefited from the financial support of Revolution Analytics and countless hours, energy, and ideas from David Smith and Joe Rickert of that firm.
Jay Emerson and Mike Kane, authors of the award-winning bigmemory
package in CRAN, read through an early draft of on parallel R programming and made valuable comments.
John Chambers (founder of S, the ancestor of R) and Martin Morgan provided advice concerning R internals, which was very helpful to me for the discussion of Rs performance issues in .
Section 7.8.4 covers a controversial topic in programming communitiesthe use of global variables. In order to be able to get a wide range of perspectives, I bounced my ideas off several people, notably R core group member Thomas Lumley and my UC Davis computer science colleague, Sean Davis. Needless to say, there is no implication that they endorse my views in that section of the book, but their comments were quite helpful.
Early in the project, I made a very rough (and very partial) draft of the book available for public comment and received helpful feedback from Ramon Diaz-Uriarte, Barbara F. La Scala, Jason Liao, and my old friend Mike Hannon. My daughter Laura, an engineering student, read parts of the early chapters and made some good suggestions that improved the book.
My own CRAN projects and other R-related research (parts of which serve as examples in the book) have benefited from the advice, feedback, and/or encouragement of many people, especially Mark Bravington, Stephen Eglen, Dirk Eddelbuett, Jay Emerson, Mike Kane, Gary King, Duncan Murdoch, and Joe Rickert.
R core group member Duncan Temple Lang is at my institution, the University of California, Davis. Though we are in different departments and thus havent interacted much, this book owes something to his presence on campus. He has helped to create a very R-aware culture at UCD, which has made it easy for me to justify to my department the large amount of time Ive spent writing this book.
This is my second project with No Starch Press. As soon as I decided to write this book, I naturally turned to No Starch Press because I like the informal style, high usability, and affordability of their products. Thanks go to Bill Pollock for approving the project, to editorial staff Keith Fancher and Alison Law, and to the freelance copyeditor Marilyn Smith.
Last but definitely not least, I thank two beautiful, brilliant, and funny womenmy wife Gamis and the aforementioned Laura, both of whom cheerfully accepted my statement Im working on the R book, whenever they asked why I was so buried in work.
Introduction
R is a scripting language for statistical data manipulation and analysis. It was inspired by, and is mostly compatible with, the statistical language S developed by AT&T. The name S, for statistics , was an allusion to another programming language with a one-letter name developed at AT&Tthe famous C language. S later was sold to a small firm, which added a graphical user interface (GUI) and named the result S-Plus.
R has become more popular than S or S-Plus, both because its free and because more people are contributing to it. R is sometimes called GNU S, to reflect its open source nature. (The GNU Project is a major collection of open source software.)
Why Use R for Your Statistical Work?
As the Cantonese say, yauh peng, yauh leng , which means both inexpensive and beautiful. Why use anything else?
R has a number of virtues:
It is a public-domain implementation of the widely regarded S statistical language, and the R/S platform is a de facto standard among professional statisticians.
It is comparable, and often superior, in power to commercial products in most of the significant sensesvariety of operations available, programmability, graphics, and so on.
It is available for the Windows, Mac, and Linux operating systems.
In addition to providing statistical operations, R is a general-purpose programming language, so you can use it to automate analyses and create new functions that extend the existing language features.
It incorporates features found in object-oriented and functional programming languages.
The system saves data sets between sessions, so you dont need to reload them each time. It saves your command history too.
Because R is open source software, its easy to get help from the user community. Also, a lot of new functions are contributed by users, many of whom are prominent statisticians.
I should warn you at the outset that you typically submit commands to R by typing in a terminal window, rather than clicking a mouse in a GUI, and most R users do not use a GUI. This doesnt mean that R doesnt do graphics. On the contrary, it includes tools for producing graphics of great utility and beauty, but they are used for system output, such as plots, not for user input.