Contents in Detail
THE BOOK OF R
A First Course in Programming and Statistics
Tilman M. Davies
San Francisco
THE BOOK OF R. Copyright 2016 by Tilman M. Davies.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.
Printed in USA
First printing
20 19 18 17 16 1 2 3 4 5 6 7 8 9
ISBN-10: 1-59327-651-6
ISBN-13: 978-1-59327-651-5
Publisher: William Pollock
Production Editor: Riley Hoffman
Cover Illustration: Josh Ellingson
Interior Design: Octopod Studios
Developmental Editor: Liz Chadwick
Technical Reviewer: Debbie Leader
Copyeditor: Kim Wimpsett
Compositor: Riley Hoffman
Proofreader: Paula Fleming
Indexer: BIM Creatives, LLC
For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 415.863.9900;
www.nostarch.com
Library of Congress Cataloging-in-Publication Data
Names: Davies, Tilman M., author.
Title: The book of R : a first course in programming and statistics / by
Tilman M. Davies.
Description: San Francisco : No Starch Press, [2016] | Includes
bibliographical references and index.
Identifiers: LCCN 2015035305| ISBN 9781593276515 | ISBN 1593276516
Subjects: LCSH: R (Computer program language) | Computer programming. |
Statistics--Data processing.
Classification: LCC QA76.73.R3 D38 2016 | DDC 519.50285--dc23
LC record available at http://lccn.loc.gov/2015035305
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an As Is basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
Prof. Dr. Dr. Jochen Weniger, 19252015
Ein Wissenschaftler. Mein Opa.
CONTENTS IN DETAIL
PART I
THE LANGUAGE
GETTING STARTED
NUMERICS, ARITHMETIC, ASSIGNMENT, AND VECTORS
MATRICES AND ARRAYS
NON-NUMERIC VALUES
LISTS AND DATA FRAMES
SPECIAL VALUES, CLASSES, AND COERCION
BASIC PLOTTING
READING AND WRITING FILES
PART II
PROGRAMMING
CALLING FUNCTIONS
CONDITIONS AND LOOPS
WRITING FUNCTIONS
EXCEPTIONS, TIMINGS, AND VISIBILITY
PART III
STATISTICS AND PROBABILITY
ELEMENTARY STATISTICS
BASIC DATA VISUALIZATION
PROBABILITY
COMMON PROBABILITY DISTRIBUTIONS
PART IV
STATISTICAL TESTING AND MODELING
SAMPLING DISTRIBUTIONS AND CONFIDENCE
HYPOTHESIS TESTING
ANALYSIS OF VARIANCE
SIMPLE LINEAR REGRESSION
MULTIPLE LINEAR REGRESSION
LINEAR MODEL SELECTION AND DIAGNOSTICS
PART V
ADVANCED GRAPHICS
ADVANCED PLOT CUSTOMIZATION
GOING FURTHER WITH THE GRAMMAR OF GRAPHICS
DEFINING COLORS AND PLOTTING IN HIGHER DIMENSIONS
INTERACTIVE 3D PLOTS
A
INSTALLING R AND CONTRIBUTED PACKAGES
B
WORKING WITH RSTUDIO
PREFACE
The aim of The Book of R: A First Course in Programming and Statistics is to provide a relatively gentle yet informative exposure to the statistical software environment R, alongside some common statistical analyses, so that readers may have a solid foundation from which to eventually become experts in their own right. Learning to use and program in a computing language is much the same as learning a new spoken language. At the beginning, it is often difficult and may even be dauntingbut total immersion in and active use of the language is the best and most effective way to become fluent.
Many beginner-style texts that focus on R can generally be allocated to one of two categories: those concerned with computational aspects (that is, syntax and general programming tools) and those with statistical modeling and analysis in mind, often one particular type. In my experience, these texts are extremely well written and contain a wealth of useful information but better suit those individuals wanting to pursue fairly specific goals from the outset. This text seeks to combine the best of both worlds, by first focusing on only an appreciation and understanding of the language and its style and subsequently using these skills to fully introduce, conduct, and interpret some common statistical practices. The target audience is, quite simply, anyone who wants to gain a foothold in R as a first computing language, perhaps with the ultimate goal of completing their own statistical analyses. This includes but is certainly not limited to undergraduates, postgraduates, academic researchers, and practitioners in the applied sciences with little or no experience in programming or statistics in general. A basic understanding of elementary mathematical behavior (for example, the order of operations) and associated operators (for example, the summation symbol ) is desirable, however.
In view of this, The Book of R can be used purely as a programming text to learn the language or as an introductory statistical methods book with accompanying instruction in R. Though it is not intended to represent an exhaustive dictionary of the language, the aim is to provide readers with a comfortable learning tool that eliminates the kind of foreboding many have voiced to me when they have considered learning R from scratch. The fact remains that there are usually many different ways to go about any given tasksomething that holds true for most so-called high-level computer languages. What this text presents reflects my own way of thinking about learning and programming in R, which I approach less as a computer scientist and more as an applied data analyst.
In part, I aim to provide a precursor and supplement to the work in The Art of R Programming: A Tour of Statistical Software Design