George Gaines, KYOS Systems Inc.
Samuel D. McQuillin, University of Houston
Front matter
preface
What is the use of a book without pictures or conversations?
Alice, Alices Adventures in Wonderland
Its wondrous, with treasures to satiate desires both subtle and gross; but its not for the timid.
Q, Q Who? Star Trek: The Next Generation
When I began writing this book, I spent quite a bit of time searching for a good quote to start things off. I ended up with two. R is a wonderfully flexible platform and language for exploring, visualizing, and understanding data. I chose the quote from Alices Adventures in Wonderland to capture the flavor of statistical analysis todayan interactive process of exploration, visualization, and interpretation.
The second quote reflects the generally held notion that R is difficult to learn. What I hope to show you is that it doesnt have to be. R is broad and powerful, with so many analytic and graphic functions available (more than 50,000 at last count) that it easily intimidates both novice and experienced users alike. But there is rhyme and reason to the apparent madness. With guidelines and instructions, you can navigate the tremendous resources available, selecting the tools you need to accomplish your work with style, elegance, efficiencyand more than a little coolness.
I first encountered R several years ago when I was applying for a new statistical consulting position. The prospective employer asked in the pre-interview material if I was conversant in R. Following the standard advice of recruiters, I immediately said yes and set off to learn it. I was an experienced statistician and researcher, had 25 years of experience as an SAS and SPSS programmer, and was fluent in a half-dozen programming languages. How hard could it be? Famous last words.
As I tried to learn the language (as fast as possible, with an interview looming), I found either tomes on the underlying structure of the language or dense treatises on specific advanced statistical methods, written by and for subject-matter experts. The online help was written in a Spartan style that was more reference than tutorial. Every time I thought I had a handle on the overall organization and capabilities of R, I found something new that made me feel ignorant and small.
To make sense of it all, I approached R as a data scientist. I thought about what it takes to successfully process, analyze, and understand data, including
Accessing the data (getting the data into the application from multiple sources)
Cleaning the data (coding missing data, fixing or deleting miscoded data, transforming variables into more useful formats)
Annotating the data (to remember what each piece represents)
Summarizing the data (getting descriptive statistics to help characterize the data)
Visualizing the data (because a picture really is worth a thousand words)
Modeling the data (uncovering relationships and testing hypotheses)
Preparing the results (creating publication-quality tables and graphs)
Then I tried to understand how I could use R to accomplish each of these tasks. Because I learn best by teaching, I eventually created a website (www.statmethods.net) to document what I had learned.
Then, about a year later, Marjan Bace, Mannings publisher, called and asked if I would like to write a book on R. I had already written 50 journal articles, 4 technical manuals, numerous book chapters, and a book on research methodology, so how hard could it be? At the risk of sounding repetitivefamous last words.
The first edition came out in 2011, and the second edition came out in 2015. I started working on the third edition two-and-a-half years ago. Describing R has always been a moving target, but the last few years have seen a revolution of sorts. It's been driven by the growth of big data, the broad adoption of tidyverse (tidyverse.org) software, the rapid development of new predictive analytic and machine learning approaches, and the development of new and more powerful data visualization technologies. I wanted the third edition to do justice to these important changes.
The book youre holding is the one that I wished I had so many years ago. I have tried to provide you with a guide to R that will allow you to quickly access the power of this great open source endeavor, without all the frustration and angst. I hope you enjoy it.