1. Welcome to R
1.1 What Is R?
As a marketing analyst, you have no doubt heard of R. You may have tried R and become frustrated and confused, after which you returned to other tools that are good enough. You may know that R uses a command line and dislike that. Or you may be convinced of Rs advantages for experts but worry that you dont have time to learn or use it.
We are here to help! Our goal is to present just the essentials , in the minimal necessary time , with hands-on learning so you will come up to speed as quickly as possible to be productive in R. In addition, well cover a few advanced topics that demonstrate the power of R and might teach advanced users some new skills.
A key thing to realize is that R is a programming language . It is not a statistics program like SPSS, SAS, JMP, or Minitab, and doesnt wish to be one. The official R Project describes R as a language and environment for statistical computing and graphics. Notice that language comes first, and that statistical is coequal with graphics. R is a great programming language for doing statistics. The inventor of the underlying language, John Chambers received the 1998 Association for Computing Machinery (ACM) Software System Award for a system that will forever alter the way people analyze, visualize, and manipulate data [].
R was based on Chamberss preceding S language (S as in statistics) developed in the 1970s and 1980s at Bell Laboratories, home of the UNIX operating system and the C programming language. S gained traction among analysts and academics in the 1990s as implemented in a commercial software package, S-PLUS. Robert Gentleman and Ross Ihaka wished to make the S approach more widely available and offered R as an open source project starting in 1997.
Since then, the popularity of R has grown geometrically. The real magic of R is that its users are able to contribute developments that enhance R with everything from additional core functions to highly specialized methods. And many do contribute! Today there are over 6,000 packages of add-on functionality available for R (see http://cran.r-project.org/web/packages for the latest count).
If you have experience in programming, you will appreciate some of Rs key features right away. If youre new to programming, this chapter describes why R is special and Chap. introduces the fundamentals of programming in R.
1.2 Why R?
There are many reasons to learn and use R. It is the platform of choice for the largest number of statisticians who create new analytics methods, so emerging techniques are often available first in R. R is rapidly becoming the default educational platform in university statistics programs and is spreading to other disciplines such as economics and psychology.
For analysts, R offers the largest and most diverse set of analytic tools and statistical methods. It allows you to write analyses that can be reused and that extend the R system itself. It runs on most operating systems and interfaces well with data systems such as online data and SQL databases. R offers beautiful and powerful plotting functions that are able to produce graphics vastly more tailored and informative than typical spreadsheet charts. Putting all of those together, R can vastly improve an analysts overall productivity. Elea knows an enterprising analyst who used R to automate the process of downloading data and producing a formatted monthly report. The automation saved him almost 40h of work each month which he didnt tell his manager for a few months!
Then there is the community. Many R users are enthusiasts who love to help others and are rewarded in turn by the simple joy of solving problems and the fact that they often learn something new. R is a dynamic system created by its users, and there is always something new to learn. Knowledge of R is a valuable skill in demand for analytics jobs at a growing number of top companies.
R code is also inspectable; you may choose to trust it, yet you are also free to verify. All of its core code and most packages that people contribute are open source. You can examine the code to see exactly how analyses work and what is happening under the hood.
Finally, R is free. It is a labor of love and professional pride for the R Core Development Team, which includes eminent statisticians and computer scientists. As with all masterpieces, the quality of their devotion is evident in the final work.
1.3 Why Not R?
Whats not to love? No doubt youve observed that not everyone in the world uses R. Being R-less is unimaginable to us, yet there are reasons why some analysts might not want to use it.
One reason not to use R is this: until youve mastered the basics of the language, many simple analyses are cumbersome to do in R. If youre new to R and want a table of means, cross-tabs, or a t-test, it may be frustrating to figure out how to get them. R is about power, flexibility, control, iterative analyses, and cutting-edge methods, not point-and-click deliverables.
Another reason is if you do not like programming. If youre new to programming, R is a great place to start. But if youve tried programming before and didnt enjoy it, R will be a challenge as well. Our job is to help you as much as we can, and we will try hard to teach R to you. However, not everyone enjoys programming. On the other hand, if youre an experienced coder, R will seem simple (perhaps deceptively so), and we will help you avoid a few pitfalls.
Some companies and their information technology or legal departments are skeptical of R because it is open source. It is common for managers to ask, If its free, how can it be good? There are many responses to that, including pointing out the hundreds of books on R, its citation in peer-reviewed articles, and the list of eminent contributors (in R, run the contributors() command and web search some of them). Or you might try the engineers adage: It can be good, fast, or cheap: pick 2. R is good and cheap, but not fast, insofar as it requires time and effort to master.
As for R being free, you should realize that contributors to R actually do derive benefit; it just happens to be non-monetary. They are compensated through respect and reputation, through the power their own work gains, and by the contributions back to the ecosystem from other users. This is a rational economic model even when the monetary price is zero.
A final concern about R is the unpredictability of its ecosystem. With packages contributed by thousands of authors, there are priceless contributions along with others that are mediocre or flawed. The downside of having access to the latest developments is that many will not stand the test of time. It is up to you to determine whether a method meets your needs, and you cannot always rely on curation or authorities to determine it for you (although you will rapidly learn which authors and which experts recommendations to trust). If you trust your judgment, this situation is no different than with any software. Caveat emptor .
We hope to convince you that for many purposes, the benefits of R outweigh the difficulties.