LitArk » Books » Children

Gareth James Daniela Witten Trevor Hastie - An Introduction to Statistical Learning

Here you can read online Gareth James Daniela Witten Trevor Hastie - An Introduction to Statistical Learning full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 0, publisher: Springer New York, New York, NY, genre: Children. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
An Introduction to Statistical Learning
Author:
Gareth James Daniela Witten Trevor Hastie / Robert Tibshirani
Publisher:
Springer New York, New York, NY
Genre:
Books / Children
Year:
0
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

An Introduction to Statistical Learning: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "An Introduction to Statistical Learning" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Gareth James Daniela Witten Trevor Hastie: author's other books

Who wrote An Introduction to Statistical Learning? Find out the surname, the name of the author of the book and a list of all author's works by series.

An Introduction to Statistical Learning — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "An Introduction to Statistical Learning" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Springer Science+Business Media New York 2013

Gareth James , Daniela Witten , Trevor Hastie and Robert Tibshirani An Introduction to Statistical Learning Springer Texts in Statistics

1. Introduction

Gareth James 1, Daniela Witten 2, Trevor Hastie 3 and Robert Tibshirani 3

(1)

Department of Information and Operations Management, University of Southern California, Los Angeles, CA, USA

(2)

Department of Biostatistics, University of Washington, Seattle, WA, USA

(3)

Department of Statistics, Stanford University, Stanford, CA, USA

1.1 An Overview of Statistical Learning

Statistical learning refers to a vast set of tools for understanding data . These tools can be classified as supervised or unsupervised . Broadly speaking, supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs . Problems of this nature occur in fields as diverse as business, medicine, astrophysics, and public policy. With unsupervised statistical learning, there are inputs but no supervising output; nevertheless we can learn relationships and structure from such data. To provide an illustration of some applications of statistical learning, we briefly discuss three real-world data sets that are considered in this book.

1.1.1 Wage Data

In this application (which we refer to as the Wage data set throughout this book), we examine a number of factors that relate to wages for a group of males from the Atlantic region of the United States. In particular, we wish to understand the association between an employees age and education , as well as the calendar year , on his wage . Consider, for example, the left-hand panel of Figure that there is a significant amount of variability associated with this average value, and so age alone is unlikely to provide an accurate prediction of a particular mans wage .

Fig. 1.1

Wage data, which contains income survey information for males from the central Atlantic region of the United States . Left: wage as a function of age . On average , wage increases with age until about 60 years of age, at which point it begins to decline . Center: wage as a function of year . There is a slow but steady increase of approximately $10,000 in the average wage between 2003 and 2009 . Right: Boxplots displaying wage as a function of education , with 1 indicating the lowest level (no high school diploma) and 5 the highest level (an advanced graduate degree). On average , wage increases with the level of education .

We also have information regarding each employees education level and the year in which the wage was earned. The center and right-hand panels of Figure , we discuss a class of approaches for addressing this problem.

1.1.2 Stock Market Data

The Wage data involves predicting a continuous or quantitative output value. This is often referred to as a regression problem. However, in certain cases we may instead wish to predict a non-numerical valuethat is, a categorical or qualitative output. For example, in Chapter we examine a stock market data set that contains the daily movements in the Standard & Poors 500 (S&P) stock index over a 5-year period between 2001 and 2005. We refer to this as the Smarket data. The goal is to predict whether the index will increase or decrease on a given day using the past 5 days percentage changes in the index. Here the statistical learning problem does not involve predicting a numerical value. Instead it involves predicting whether a given days stock market performance will fall into the Up bucket or the Down bucket. This is known as a classification problem. A model that could accurately predict the direction in which the market will move would be very useful!

The left-hand panel of Figure ).

Fig. 1.2

Left: Boxplots of the previous days percentage change in the S&P index for the days for which the market increased or decreased, obtained from the Smarket data . Center and Right: Same as left panel, but the percentage changes for 2 and 3 days previous are shown .

Fig. 1.3

We fit a quadratic discriminant analysis model to the subset of the Smarket data corresponding to the 20012004 time period, and predicted the probability of a stock market decrease using the 2005 data. On average, the predicted probability of decrease is higher for the days in which the market does decrease. Based on these results, we are able to correctly predict the direction of movement in the market 60% of the time .

1.1.3 Gene Expression Data

The previous two applications illustrate data sets with both input and output variables. However, another important class of problems involves situations in which we only observe input variables, with no corresponding output. For example, in a marketing setting, we might have demographic information for a number of current or potential customers. We may wish to understand which types of customers are similar to each other by grouping individuals according to their observed characteristics. This is known as a clustering problem. Unlike in the previous examples, here we are not trying to predict an output variable.

We devote Chapter to a discussion of statistical learning methods for problems in which no natural output variable is available. We consider the NCI60 data set, which consists of 6,830 gene expression measurements for each of 64 cancer cell lines. Instead of predicting a particular output variable, we are interested in determining whether there are groups, or clusters, among the cell lines based on their gene expression measurements. This is a difficult question to address, in part because there are thousands of gene expression measurements per cell line, making it hard to visualize the data.

The left-hand panel of Figure suggests at least four groups of cell lines, which we have represented using separate colors. We can now examine the cell lines within each cluster for similarities in their types of cancer, in order to better understand the relationship between gene expression levels and cancer.

Fig. 1.4

Left: Representation of the NCI60 gene expression data set in a two-dimensional space, Z 1 and Z 2 . Each point corresponds to one of the 64 cell lines. There appear to be four groups of cell lines, which we have represented using different colors . Right: Same as left panel except that we have represented each of the 14 different types of cancer using a different colored symbol. Cell lines corresponding to the same cancer type tend to be nearby in the two-dimensional space .

In this particular data set, it turns out that the cell lines correspond to 14 different types of cancer. (However, this information was not used to create the left-hand panel of Figure is identical to the left-hand panel, except that the 14 cancer types are shown using distinct colored symbols. There is clear evidence that cell lines with the same cancer type tend to be located near each other in this two-dimensional representation. In addition, even though the cancer information was not used to produce the left-hand panel, the clustering obtained does bear some resemblance to some of the actual cancer types observed in the right-hand panel. This provides some independent verification of the accuracy of our clustering analysis.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «An Introduction to Statistical Learning»

Look at similar books to An Introduction to Statistical Learning. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Daniela Kapitánová

Samko Tále's Cemetery Book

James Patterson

Truth or Die

Linda Daniela

Smart Pedagogy of Game-based Learning

Carlos Reis Pinheiro

Introduction to Statistical and Machine Learning Methods for Data Science

Hofmann Daniela.

Creating Communities

James Luscombe

Statistical Mechanics: From Thermodynamics to the Renormalization Group

Hastie Trevor

An introduction to statistical learning: with applications in R

Witten

Handbook of Geophysics and Archaeology

Daniel D. Gutierrez [Daniel D. Gutierrez]

Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R

Larry Pace

Beginning R An Introduction to Statistical Programming

Reid Hastie

Wiser: Getting Beyond Groupthink to Make Groups Smarter

Gareth Cliff

Gareth Cliff On Everything

Reviews about «An Introduction to Statistical Learning»

Discussion, reviews of the book An Introduction to Statistical Learning and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.