Matthias Kohl
Introduction to statistical data analysis with R
Introduction to statistical data analysis with R 1 st edition
2015 Matthias Kohl & bookboon.com ISBN 978-87-403-1123-5
Contents
Preface 9
1 Statistical Software R
1.1 R and its development history
1.2 Structure of R
1.3 Installation of R
1.4 Working with R
1.5 Exercises
10
12
13
14
17
2 Descriptive Statistics
2.1 Basics
2.2 Excursus: Data Import and Export with R
2.3 Import of ICU-Dataset
2.4 Categorical Variables
2.5 Metric Variables
2.6 Exercises
18
22
25
29
52
78
I joined MITAS because
I wanted
real responsibili
I joined MITAS because
e Graduate Programme for Engineers and Geoscientists
Maersk.com/Mitas
www.discovermitas.com e Graduate Programme
I wanted real responsibili
for Engineers and Geoscientists Maersk.com/Mitas
Real work
International opportunities
ree woorree work placements
Month 16 I was a construction Month 16
supervisor in I was a construction the North Sea advising and helping foremen
supervisor the North Sea advising
solve problems helping for International opportunities
ree work placements solve problems
www.job.oticon.dk
6 Statistical Tests 177
6.1 Introduction 177
6.2 Examples 187
6.3 Exercises 207
Software versions 209
Bibliography 210 Index 216
WHY WAIT
PROGRESS?
DARE TO DISCOVER
Discovery means many different things at
Schlumberger. But its the spirit that unites every
single one of us. It doesnt matter whether they
join our business, engineering or technology teams,
our trainees push boundaries, break new ground
and deliver the exceptional. If that excites you,
then we want to hear from you.
careers.slb.com/recentgraduates
FOR
Introduction to statistical data analysis with R List of Figures
List of Figures
Figure 1.1: R GUI (64-bit) on Windows (German system). 15
Figure 1.2: RStudio IDE after installation on Ubuntu Linux (German system). 16
Figure 1.3: RStudio IDE after opening a new R script on Ubuntu Linux (German system). 16
Figure 2.1: Interplay between probability theory, descriptive and inferential statistics. 19
Figure 2.2: Types of attributes and scales of measurement. 21
Figure 2.3: RStudio window for import of text files. 23
Figure 2.4: RStudio window Environment with a data object. 24
Figure 2.5: View of the exact structure of a dataset in RStudio. 28
Figure 2.6: Interactive context based help in RStudio. 32
Figure 2.7: Installation of R packages in RStudio. 32
Figure 2.8: The values in a box-and-whisker plot. 40
Figure 2.9 : Examples of skewness. 61
Figure 2.10: Examples of kurtosis. 63
Figure 3.1: A negative example for using colors and diagrams. 80
Figure 3.2: A negative example with improved colors. 83
Figure 3.3: From a negative to a positive example. 83
Figure 3.4: RStudio window Plots with an example. 86
Figure 3.5: RStudio window for saving a plot as image. 87
Figure 3.6: RStudio window for saving a plot as pdf file. 87
Figure 3.7: Order the categories! 90
Figure 3.8: Once again: Order the categories! 90
Figure 3.9: And once again: Order the categories! 91
Figure 5.1: Illustration of unbiased and efficient. 141
Figure 5.2: Ratio between 95~ quantiles of t and standard normal distribution. 164
Figure 6.1: Sample size dependent on effect size. 184
Figure 6.2: Sample size dependent on variance. 184
Introduction to statistical data analysis with R List of Tables
List of Tables
Table 2.1: Overview of some basic functions for data import with R. 22
Table 3.1: Overview of devices supported by R. 88
Table 4.1: Notions from statistics and their counterparts in probability theory. 135
Table 6.1: Decision situation in case of statistical tests. 179
Table 6.2: Example of a 2 2 contingency table. 196
Introduction to statistical data analysis with R Preface
Preface
Statistics is everywhere today and we are steadily, knowingly or unknowingly, confronted with results of statistical procedures. Examples are internet search engines, targeted ads on websites, assessments of our creditworthiness, reference ranges of blood tests, weather forecast, election forecast, and many more. Often, statistical procedures are not appropriately applied or their results are not properly reported. Therefore, basic statistical knowledge is not only important in professional but also in everyday life and helps to distinguish between correct and incorrect information.
The basis of this book are my lecture notes of several statistics courses I gave in recent years at Furtwangen University, Campus Villingen-Schwenningen, in the framework of various bachelor and master programs as well as at Freiburg University in the framework of the international master program in biomedical sciences (IMBS).
As the title of the book already indicates, the introduction to statistical analysis happens by using the statistical software R (R Core Team (2015a)), a free software that is available for most operating systems. The R code used in the book is contained in the file www.stamats.de/RCodeEN.zip in form of text files with file extension . R . The R code of each chapter runs independent of the other chapters.