LitArk » Books » Computer

Roger D. Peng - Exploratory data analysis with R

Here you can read online Roger D. Peng - Exploratory data analysis with R full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2016, publisher: Leanpub, genre: Computer / Science. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Exploratory data analysis with R
Author:
Roger D Peng
Publisher:
Leanpub
Genre:
Computer / Science
Year:
2016
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Exploratory data analysis with R: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Exploratory data analysis with R" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

This book covers some of the basics of visualizing data in R and summarizing highdimensional data with statistical multivariate analysis techniques. There is less of an emphasis on formal statistical inference methods, as inference is typically not the focus of EDA. Rather, the goal is to show the data, summarize the evidence and identify interesting patterns while eliminating ideas that likely wont pan out. Throughout the book, we will focus on the R statistical programming language. We will cover the various plotting systems in R and how to use them effectively. We will also discuss how to implement dimension reduction techniques like clustering and the singular value decomposition. All of these techniques will help you to visualize your data and to help you make key decisions in any data analysis.

Roger D. Peng: author's other books

Who wrote Exploratory data analysis with R? Find out the surname, the name of the author of the book and a list of all author's works by series.

Exploratory data analysis with R — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Exploratory data analysis with R" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Exploratory Data Analysis with R

Roger D. Peng

This book is for sale at http://leanpub.com/exdata

This version was published on 2016-07-20

* * * * *

This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do.

* * * * *

2015 - 2016 Roger D. Peng

Stay in Touch!

Thanks for purchasing this book. If you are interested inhearing more from me about things that Im working on (books, datascience courses, podcast, etc.), you can do two things:

First, I encourage you to join my mailing list of Leanpub Readers. On this list I send out updates of my own activities as well as occasional comments on data science current events. Ill also let you know what my co-conspirators Jeff Leek and Brian Caffo are up to because sometimes they do really cool stuff.
Second, I have a regular podcast called Not So Standard Deviations that I co-host with Dr. Hilary Parker, a Data Scientist at Stitch Fix. On this podcast, Hilary and I talk about the craft of data science and discuss common issues and problems in analyzing data. Well also compare how data science is approached in both academia and industry contexts and discuss the latest industry trends. You can listen to recent episodes on our SoundCloud page or you can subscribe to it in iTunes or your favorite podcasting app.

For those of you who purchased a printed copy of this book, I encourage you to go to the Leanpub web site and obtain the e-book version, which is available for free. The reason is that I will occasionally update the book with new material and readers who purchase the e-book version are entitled to free updates (this is unfortunately not yet possible with printed books).

Thanks again for purchasing this book and please do stay in touch!

Preface

Exploratory data analysis is a bit difficult to describe in concrete definitive terms, but I think most data analysts and statisticians know it when they see it. I like to think of it in terms of an analogy.

Filmmakers will shoot a lot of footage when making a movie or some film production, not all of which will be used. In addition, the footage will typically not be shot in the order that the storyline takes place, because of actors schedules or other complicating factors. In addition, in some cases, it may be difficult to figure out exactly how the story should be told while shooting the footage. Rather, its sometimes easier to see how the story flows when putting the various clips together in the editing room.

In the editing room, the director and the editor can play around a bit with different versions of different scenes to see which dialogue sounds better, which jokes are funnier, or which scenes are more dramatic. Scenes that just dont work might get dropped, and scenes that are particularly powerful might get extended or re-shot. This rough cut of the film is put together quickly so that important decisions can be made about what to pursue further and where to back off. Finer details like color correction or motion graphics might not be implemented at this point. Ultimately, this rough cut will help the director and editor create the final cut, which is what the audience will ultimately view.

Exploratory data analysis is what occurs in the editing room of a research project or any data-based investigation. EDA is the process of making the rough cut for a data analysis, the purpose of which is very similar to that in the film editing room. The goals are many, but they include identifying relationships between variables that are particularly interesting or unexpected, checking to see if there is any evidence for or against a stated hypothesis, checking for problems with the collected data, such as missing data or measurement error), or identifying certain areas where more data need to be collected. At this point, finer details of presentation of the data and evidence, important for the final product, are not necessarily the focus.

Ultimately, EDA is important because it allows the investigator to make critical decisions about what is interesting to follow up on and what probably isnt worth pursuing because the data just dont provide the evidence (and might never provide the evidence, even with follow up). These kinds of decisions are important to make if a project is to move forward and remain within its budget.

This book covers some of the basics of visualizing data in R and summarizing high-dimensional data with statistical multivariate analysis techniques. There is less of an emphasis on formal statistical inference methods, as inference is typically not the focus of EDA. Rather, the goal is to show the data, summarize the evidence and identify interesting patterns while eliminating ideas that likely wont pan out.

Throughout the book, we will focus on the R statistical programming language. We will cover the various plotting systems in R and how to use them effectively. We will also discuss how to implement dimension reduction techniques like clustering and the singular value decomposition. All of these techniques will help you to visualize your data and to help you make key decisions in any data analysis.

Getting Started with R

3.1 Installation

The first thing you need to do to get started with R is to install iton your computer. R works on pretty much every platform available,including the widely available Windows, Mac OS X, and Linuxsystems. If you want to watch a step-by-step tutorial on how to installR for Mac or Windows, you can watch these videos:

Installing R on Windows
Installing R on the Mac

There is also an integrated development environment available for Rthat is built by RStudio. I really like this IDEit has a niceeditor with syntax highlighting, there is an R object viewer, andthere are a number of other nice features that are integrated. You cansee how to install RStudio here

Installing RStudio

The RStudio IDE is available from RStudios website.

3.2 Getting started with the R interface

After you install R you will need to launch it and start writing Rcode. Before we get to exactly how to write R code, its useful to geta sense of how the system is organized. In these two videos I talkabout where to write code and how set your working directory, whichlets R know where to find all of your files.

Writing code and setting your working directory on the Mac
Writing code and setting your working directory on Windows

Managing Data Frames with the dplyr package

Watch a video of this chapter

4.1 Data Frames

The data frame is a key data structure in statistics and in R. The basic structure of a data frame is that there is one observation per row and each column represents a variable, a measure, feature, or characteristic of that observation. R has an internal implementation of data frames that is likely the one you will use most often. However, there are packages on CRAN that implement data frames via things like relational databases that allow you to operate on very very large data frames (but we wont discuss them here).

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Exploratory data analysis with R»

Look at similar books to Exploratory data analysis with R. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

YASSINE MOUSAIF

Regression Models for Data Science in R: Statistical inference for data science.

Simona Balzano

Statistical Learning and Modeling in Data Analysis: Methods and Applications

César Pérez López

Statistics and Data Analysis Through R

Miller

Statistics for data science: leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks

Martins Luiz Felipe

Mastering Python data analysis become an expert at using Python for advanced statistical analysis of data using real-world examples

Kenny

Better Business Decisions from Data Statistical Analysis for Professional Success

Suresh Kumar Mukhiya

Hands-On Exploratory Data Analysis with Python: Perform EDA techniques to understand, summarize, and investigate your data

James Gentle

Statistical Analysis of Financial Data: With Examples in R

Thomas Mailund [Thomas Mailund]

Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist

Daniel D. Gutierrez [Daniel D. Gutierrez]

Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R

Bruce Ratner

Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition

Rae R. Newton

Your Statistical Consultant: Answers to Your Data Analysis Questions

Reviews about «Exploratory data analysis with R»

Discussion, reviews of the book Exploratory data analysis with R and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.