• Complain

Cathy O’Neil - Doing Data Science

Here you can read online Cathy O’Neil - Doing Data Science full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2013, publisher: O’Reilly Media, Inc., genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Cathy O’Neil Doing Data Science

Doing Data Science: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Doing Data Science" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field thats so clouded in hype? This insightful book, based on Columbia Universitys Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If youre familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Topics include:

  • Statistical inference, exploratory data analysis, and the data science process
  • Algorithms
  • Spam filters, Naive Bayes, and data wrangling
  • Logistic regression
  • Financial modeling
  • Recommendation engines and causality
  • Data visualization
  • Social networks and data journalism
  • Data engineering, MapReduce, Pregel, and Hadoop

Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy ONeil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Cathy O’Neil: author's other books


Who wrote Doing Data Science? Find out the surname, the name of the author of the book and a list of all author's works by series.

Doing Data Science — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Doing Data Science" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
About the Authors

Cathy ONeil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York start-up scene, writes a blog at mathbabe.org, and is involved with Occupy Wall Street.

Rachel Schutt is the Senior Vice President for Data Science at News Corp. She earned a PhD in Statistics from Columbia University, and was a statistician at Google Research for several years. She is an adjunct professor in Columbias Department of Statistics and a founding member of the Education Committee for the Institute for Data Sciences and Engineering at Columbia. She holds several pending patents based on her work at Google, where she helped build user-facing products by prototyping algorithms and building models to understand user behavior. She has a master's degree in mathematics from NYU, and a master's degree in Engineering-Economic Systems and Operations Research from Stanford University. Her undergraduate degree is in Honors Mathematics from the University of Michigan.

Chapter 1. Introduction: What Is Data Science?

Over the past few years, theres been a lot of hype in the media about data science and Big Data. A reasonable first reaction to all of this might be some combination of skepticism and confusion; indeed we, Cathy and Rachel, had that exact reaction.

And we let ourselves indulge in our bewilderment for a while, first separately, and then, once we met, together over many Wednesday morning breakfasts. But we couldnt get rid of a nagging feeling that there was something real there, perhaps something deep and profound representing a paradigm shift in our culture around data. Perhaps, we considered, its even a paradigm shift that plays to our strengths. Instead of ignoring it, we decided to explore it more.

But before we go into that, lets first delve into what struck us as confusing and vagueperhaps youve had similar inclinations. After that well explain what made us get past our own concerns, to the point where Rachel created a course on data science at Columbia University, Cathy blogged the course, and youre now reading a book based on it.

Big Data and Data Science Hype

Lets get this out of the way right off the bat, because many of you are likely skeptical of data science already for many of the reasons we were. We want to address this up front to let you know: were right there with you . If youre a skeptic too, it probably means you have something useful to contribute to making data science into a more legitimate field that has the power to have a positive impact on society.

So, what is eyebrow-raising about Big Data and data science? Lets count the ways:

  1. Theres a lack of definitions around the most basic terminology. What is Big Data anyway? What does data science mean? What is the relationship between Big Data and data science? Is data science the science of Big Data? Is data science only the stuff going on in companies like Google and Facebook and tech companies? Why do many people refer to Big Data as crossing disciplines (astronomy, finance, tech, etc.) and to data science as only taking place in tech? Just how big is big? Or is it just a relative term? These terms are so ambiguous, theyre well-nigh meaningless.
  2. Theres a distinct lack of respect for the researchers in academia and industry labs who have been working on this kind of stuff for years, and whose work is based on decades (in some cases, centuries) of work by statisticians, computer scientists, mathematicians, engineers, and scientists of all types. From the way the media describes it, machine learning algorithms were just invented last week and data was never big until Google came along. This is simply not the case. Many of the methods and techniques were usingand the challenges were facing noware part of the evolution of everything thats come before. This doesnt mean that theres not new and exciting stuff going on, but we think its important to show some basic respect for everything that came before.
  3. The hype is crazypeople throw around tired phrases straight out of the height of the pre-financial crisis era like Masters of the Universe to describe data scientists, and that doesnt bode well. In general, hype masks reality and increases the noise-to-signal ratio. The longer the hype goes on, the more many of us will get turned off by it, and the harder it will be to see whats good underneath it all, if anything.
  4. Statisticians already feel that they are studying and working on the Science of Data. Thats their bread and butter. Maybe you, dear reader, are not a statistican and dont care, but imagine that for the statistician, this feels a little bit like how identity theft might feel for you. Although we will make the case that data science is not just a rebranding of statistics or machine learning but rather a field unto itself, the media often describes data science in a way that makes it sound like as if its simply statistics or machine learning in the context of the tech industry.
  5. People have said to us, Anything that has to call itself a science isnt. Although there might be truth in there, that doesnt mean that the term data science itself represents nothing, but of course what it represents may not be science but more of a craft.
Getting Past the Hype

Rachels experience going from getting a PhD in statistics to working at Google is a great example to illustrate why we thought, in spite of the aforementioned reasons to be dubious, there might be some meat in the data science sandwich. In her words:

It was clear to me pretty quickly that the stuff I was working on at Google was different than anything I had learned at school when I got my PhD in statistics. This is not to say that my degree was useless; far from itwhat Id learned in school provided a framework and way of thinking that I relied on daily, and much of the actual content provided a solid theoretical and practical foundation necessary to do my work.

But there were also many skills I had to acquire on the job at Google that I hadnt learned in school. Of course, my experience is specific to me in the sense that I had a statistics background and picked up more computation, coding, and visualization skills, as well as domain expertise while at Google. Another person coming in as a computer scientist or a social scientist or a physicist would have different gaps and would fill them in accordingly. But what is important here is that, as individuals, we each had different strengths and gaps, yet we were able to solve problems by putting ourselves together into a data team well-suited to solve the data problems that came our way.

Heres a reasonable response you might have to this story. Its a general truism that, whenever you go from school to a real job, you realize theres a gap between what you learned in school and what you do on the job. In other words, you were simply facing the difference between academic statistics and industry statistics.

We have a couple replies to this:

  • Sure, theres is a difference between industry and academia. But does it really have to be that way? Why do many courses in school have to be so intrinsically out of touch with reality?
  • Even so, the gap doesnt represent simply a difference between industry statistics and academic statistics. The general experience of data scientists is that, at their job, they have access to a
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Doing Data Science»

Look at similar books to Doing Data Science. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Doing Data Science»

Discussion, reviews of the book Doing Data Science and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.