• Complain

it-ebooks - The Art of Data Science

Here you can read online it-ebooks - The Art of Data Science full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: iBooker it-ebooks, genre: Romance novel. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

No cover
  • Book:
    The Art of Data Science
  • Author:
  • Publisher:
    iBooker it-ebooks
  • Genre:
  • Year:
    2017
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

The Art of Data Science: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "The Art of Data Science" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

it-ebooks: author's other books


Who wrote The Art of Data Science? Find out the surname, the name of the author of the book and a list of all author's works by series.

The Art of Data Science — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "The Art of Data Science" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
The Art of Data Science A Guide for Anyone Who Works with Data Roger D Peng - photo 1
The Art of Data Science
A Guide for Anyone Who Works with Data
Roger D. Peng and Elizabeth Matsui

This book is for sale at http://leanpub.com/artofdatascience

This version was published on 2017-01-13

This is a Leanpub book Leanpub empowers authors and publishers with - photo 2

* * * * *

This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do.

* * * * *

2015 - 2017 Skybrude Consulting, LLC

Special thanks to Maggie Matsui, who created all of the artwork for this book.

Data Analysis as Art

Data analysis is hard, and part of the problem is that few people can explain how to do it. Its not that there arent any people doing data analysis on a regular basis. Its that the people who are really good at it have yet to enlighten us about the thought process that goes on in their heads.

Imagine you were to ask a songwriter how she writes her songs. There are many tools upon which she can draw. We have a general understanding of how a good song should be structured: how long it should be, how many verses, maybe theres a verse followed by a chorus, etc. In other words, theres an abstract framework for songs in general. Similarly, we have music theory that tells us that certain combinations of notes and chords work well together and other combinations dont sound good. As good as these tools might be, ultimately, knowledge of song structure and music theory alone doesnt make for a good song. Something else is needed.

In Donald Knuths legendary 1974 essay Computer Programming as an Art, Knuth talks about the difference between art and science. In that essay, he was trying to get across the idea that although computer programming involved complex machines and very technical knowledge, the act of writing a computer program had an artistic component. In this essay, he says that

Science is knowledge which we understand so well that we can teach it to a computer.

Everything else is art.

At some point, the songwriter must inject a creative spark into the process to bring all the songwriting tools together to make something that people want to listen to. This is a key part of the art of songwriting. That creative spark is difficult to describe, much less write down, but its clearly essential to writing good songs. If it werent, then wed have computer programs regularly writing hit songs. For better or for worse, that hasnt happened yet.

Much like songwriting (and computer programming, for that matter), its important to realize that data analysis is an art. It is not something yet that we can teach to a computer. Data analysts have many tools at their disposal, from linear regression to classification trees and even deep learning, and these tools have all been carefully taught to computers. But ultimately, a data analyst must find a way to assemble all of the tools and apply them to data to answer a relevant questiona question of interest to people.

Unfortunately, the process of data analysis is not one that we have been able to write down effectively. Its true that there are many statistics textbooks out there, many lining our own shelves. But in our opinion, none of these really addresses the core problems involved in conducting real-world data analyses. In 1991, Daryl Pregibon, a prominent statistician previously of AT&T Research and now of Google, said in reference to the process of data analysis that statisticians have a process that they espouse but do not fully understand.

Describing data analysis presents a difficult conundrum. On the one hand, developing a useful framework involves characterizing the elements of a data analysis using abstract language in order to find the commonalities across different kinds of analyses. Sometimes, this language is the language of mathematics. On the other hand, it is often the very details of an analysis that makes each one so difficult and yet interesting. How can one effectively generalize across many different data analyses, each of which has important unique aspects?

What we have set out to do in this book is to write down the process of data analysis. What we describe is not a specific formula for data analysissomething like apply this method and then run that test but rather is a general process that can be applied in a variety of situations. Through our extensive experience both managing data analysts and conducting our own data analyses, we have carefully observed what produces coherent results and what fails to produce useful insights into data. Our goal is to write down what we have learned in the hopes that others may find it useful.

Epicycles of Analysis

To the uninitiated, a data analysis may appear to follow a linear, one-step-after-the-other process which at the end, arrives at a nicely packaged and coherent result. In reality, data analysis is a highly iterative and non-linear process, better reflected by a series of epicycles (see Figure), in which information is learned at each step, which then informs whether (and how) to refine, and redo, the step that was just performed, or whether (and how) to proceed to the next step.

An epicycle is a small circle whose center moves around the circumference of a larger circle. In data analysis, the iterative process that is applied to all steps of the data analysis can be conceived of as an epicycle that is repeated for each step along the circumference of the entire data analysis process. Some data analyses appear to be fixed and linear, such as algorithms embedded into various software platforms, including apps. However, these algorithms are final data analysis products that have emerged from the very non-linear work of developing and refining a data analysis so that it can be algorithmized.

Epicycles of Analysis 21 Setting the Scene Before diving into the epicycle of - photo 3Epicycles of Analysis
2.1 Setting the Scene

Before diving into the epicycle of analysis, its helpful to pause and consider what we mean by a data analysis. Although many of the concepts we will discuss in this book are applicable to conducting a study, the framework and concepts in this, and subsequent, chapters are tailored specifically to conducting a data analysis. While a study includes developing and executing a plan for collecting data, a data analysis presumes the data have already been collected. More specifically, a study includes the development of a hypothesis or question, the designing of the data collection process (or study protocol), the collection of the data, and the analysis and interpretation of the data. Because a data analysis presumes that the data have already been collected, it includes development and refinement of a question and the process of analyzing and interpreting the data. It is important to note that although a data analysis is often performed without conducting a study, it may also be performed as a component of a study.

2.2 Epicycle of Analysis

There are 5 core activities of data analysis:

  1. Stating and refining the question
  2. Exploring the data
  3. Building formal statistical models
  4. Interpreting the results
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «The Art of Data Science»

Look at similar books to The Art of Data Science. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «The Art of Data Science»

Discussion, reviews of the book The Art of Data Science and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.