• Complain

Rachel Schutt - Doing Data Science: Straight Talk from the Frontline

Here you can read online Rachel Schutt - Doing Data Science: Straight Talk from the Frontline full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2013, publisher: OReilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Rachel Schutt Doing Data Science: Straight Talk from the Frontline
  • Book:
    Doing Data Science: Straight Talk from the Frontline
  • Author:
  • Publisher:
    OReilly Media
  • Genre:
  • Year:
    2013
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Doing Data Science: Straight Talk from the Frontline: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Doing Data Science: Straight Talk from the Frontline" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field thats so clouded in hype? This insightful book, based on Columbia Universitys Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If youre familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Topics include:

  • Statistical inference, exploratory data analysis, and the data science process
  • Algorithms
  • Spam filters, Naive Bayes, and data wrangling
  • Logistic regression
  • Financial modeling
  • Recommendation engines and causality
  • Data visualization
  • Social networks and data journalism
  • Data engineering, MapReduce, Pregel, and Hadoop

Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy ONeil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Rachel Schutt: author's other books


Who wrote Doing Data Science: Straight Talk from the Frontline? Find out the surname, the name of the author of the book and a list of all author's works by series.

Doing Data Science: Straight Talk from the Frontline — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Doing Data Science: Straight Talk from the Frontline" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Doing Data Science
Rachel Schutt
Cathy ONeil
Beijing Cambridge Farnham Kln Sebastopol Tokyo Dedication In loving memory of - photo 1

Beijing Cambridge Farnham Kln Sebastopol Tokyo

Dedication

In loving memory of Kelly Feeney.

Special Upgrade Offer

If you purchased this ebook directly from oreilly.com, you have the following benefits:

  • DRM-free ebooksuse your ebooks across devices without restrictions or limitations

  • Multiple formatsuse on your laptop, tablet, or phone

  • Lifetime access, with free updates

  • Dropbox syncingyour files, anywhere

If you purchased this ebook from another retailer, you can upgrade your ebook to take advantage of all these benefits for just $4.99. to access your ebook upgrade.

Please note that upgrade offers are not available from sample content.

Preface
Rachel Schutt

Data science is an emerging field in industry, and as yet, it is not well-defined as an academic subject. This book represents an ongoing investigation into the central question: What is data science? Its based on a class called Introduction to Data Science, which I designed and taught at Columbia University for the first time in the Fall of 2012.

In order to understand this book and its origins, it might help you to understand a little bit about me and what my motivations were for creating the class.

Motivation

In short, I created a course that I wish had existed when I was in college, but that was the 1990s, and we werent in the midst of a data explosion, so the class couldnt have existed back then. I was a math major as an undergraduate, and the track I was on was theoretical and proof-oriented. While I am glad I took this path, and feel it trained me for rigorous problem-solving, I would have also liked to have been exposed then to ways those skills could be put to use to solve real-world problems.

I took a wandering path between college and a PhD program in statistics, struggling to find my field and placea place where I could put my love of finding patterns and solving puzzles to good use. I bring this up because many students feel they need to know what they are going to do with their lives now, and when I was a student, I couldnt plan to work in data science as it wasnt even yet a field. My advice to students (and anyone else who cares to listen): you dont need to figure it all out now. Its OK to take a wandering path. Who knows what you might find? After I got my PhD, I worked at Google for a few years around the same time that data science and data scientist were becoming terms in Silicon Valley.

The world is opening up with possibilities for people who are quantitatively minded and interested in putting their brains to work to solve the worlds problems. I consider it my goal to help these students to become critical thinkers, creative solvers of problems (even those that have not yet been identified), and curious question askers. While I myself may never build a mathematical model that is a piece of the cure for cancer, or identifies the underlying mystery of autism, or that prevents terrorist attacks, I like to think that Im doing my part by teaching students who might one day do these things. And by writing this book, Im expanding my reach to an even wider audience of data scientists who I hope will be inspired by this book, or learn tools in it, to make the world better and not worse.

Building models and working with data is not value-neutral. You choose the problems you will work on, you make assumptions in those models, you choose metrics, and you design the algorithms.

The solutions to all the worlds problems may not lie in data and technologyand in fact, the mark of a good data scientist is someone who can identify problems that can be solved with data and is well-versed in the tools of modeling and code. But I do believe that interdisciplinary teams of people that include a data-savvy, quantitatively minded, coding-literate problem-solver (lets call that person a data scientist) could go a long way.

Origins of the Class

I proposed the class in March 2012. At the time, there were three primary reasons. The first will take the longest to explain.

Reason 1 : I wanted to give students an education in what its like to be a data scientist in industry and give them some of the skills data scientists have.

I was working on the Google+ data science team with an interdisciplinary team of PhDs. There was me (a statistician), a social scientist, an engineer, a physicist, and a computer scientist. We were part of a larger team that included talented data engineers who built the data pipelines, infrastructure, and dashboards, as well as built the experimental infrastructure (A/B testing). Our team had a flat structure. Together our skills were powerful, and we were able to do amazing things with massive datasets, including predictive modeling, prototyping algorithms, and unearthing patterns in the data that had huge impact on the product.

We provided leadership with insights for making data-driven decisions, while also developing new methodologies and novel ways to understand causality. Our ability to do this was dependent on top-notch engineering and infrastructure. We each brought a solid mix of skills to the team, which together included coding, software engineering, statistics, mathematics, machine learning, communication, visualization, exploratory data analysis (EDA), data sense, and intuition, as well as expertise in social networks and the social space.

To be clear, no one of us excelled at all those things, but together we did; we recognized the value of all those skills, and thats why we thrived. What we had in common was integrity and a genuine interest in solving interesting problems, always with a healthy blend of skepticism as well as a sense of excitement over scientific discovery. We cared about what we were doing and loved unearthing patterns in the data.

I live in New York and wanted to bring my experience at Google back to students at Columbia University because I believe this is stuff they need to know, and because I enjoy teaching. I wanted to teach them what I had learned on the job. And I recognized that there was an emerging data scientist community in the New York tech scene, and I wanted students to hear from them as well.

One aspect of the class was that we had guest lectures by data scientists currently working in industry and academia, each of whom had a different mix of skills. We heard a diversity of perspectives, which contributed to a holistic understanding of data science.

Reason 2 : Data science has the potential to be a deep and profound research discipline impacting all aspects of our lives. Columbia University and Mayor Bloomberg announced the Institute for Data Sciences and Engineering in July 2012. This course created an opportunity to develop the theory of data science and to formalize it as a legitimate science.

Reason 3 : I kept hearing from data scientists in industry that you cant teach data science in a classroom or university setting, and I took that on as a challenge. I thought of my classroom as an incubator of data science teams. The students I had were very impressive and are turning into top-notch data scientists. Theyve contributed a chapter to this book, in fact.

Origins of the Book

The class would not have become a book if I hadnt met Cathy ONeil, a mathematician-turned-data scientist and prominent and out-spoken blogger on

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Doing Data Science: Straight Talk from the Frontline»

Look at similar books to Doing Data Science: Straight Talk from the Frontline. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Doing Data Science: Straight Talk from the Frontline»

Discussion, reviews of the book Doing Data Science: Straight Talk from the Frontline and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.