Thinking with Data
Max Shron
Praise for Thinking with Data
" Thinking with Data gets to the essence of the process, and guides data scientists in answering that most important questionwhats the problem were really trying to solve?
Hilary MasonData Scientist in Residence at Accel Partners; co-founder of the DataGotham Conference
Thinking with Data does a wonderful job of reminding data scientists to look past technical issues and to focus on making an impact on the broad business objectives of their employers and clients. Its a useful supplement to a data science curriculum that is largely focused on the technical machinery of statistics and computer science.
John Myles WhiteScientist at Facebook; author of Machine Learning for Hackers and Bandit Algorithms for Website Optimization
This is a great piece of work. It will be required reading for my team.
Nick KolegraffDirector of Data Science at Rackspace
Shrons Thinking with Data is a nice mix of academic traditions, from design to philosophy, that rescues data from mathematics and the regime of pure calculation. These are lessons that should be included in any data science course!
Mark HansenDirector of David and Helen Gurley Brown Institute for Media Innovation; Graduate School of Journalism at Columbia University
Preface
Working with data is about producing knowledge. Whether that knowledge is consumed by a person or acted on by a machine, our goal as professionals working with data is to use observations to learn about how the world works. We want to turn information into insights, and asking the right questions ensures that were creating insights about the right things. The purpose of this book is to help us understand that these are our goals and that we are not alone in this pursuit.
I work as a data strategy consultant. I help people figure out what problems they are trying to solve, how to solve them, and what to do with them once the problems are solved. This book grew out of the recognition that the problem of asking good questions and knowing how to put the answers together is not a new one. This problemthe problem of turning observations into knowledgeis one that has been worked on again and again and again by experts in a variety of disciplines. We have much to learn from them.
People use data to make knowledge to accomplish a wide variety of things. There is no one goal of all data work, just as there is no one job description that encapsulates it. Consider this incomplete list of things that can be made better with data:
- Answering a factual question
- Telling a story
- Exploring a relationship
- Discovering a pattern
- Making a case for a decision
- Automating a process
- Judging an experiment
Doing each of these well in a data-driven way draws on different strengths and skills. The most obvious are what you might call the hard skills of working with data: data cleaning, mathematical modeling, visualization, model or graph interpretation, and so on.[]
What is missing from most conversations is how important the soft skills are for making data useful. Determining what problem one is actually trying to solve, organizing results into something useful, translating vague problems or questions into precisely answerable ones, trying to figure out what may have been left out of an analysis, combining multiple lines or arguments into one useful resultthe list could go on. These are the skills that separate the data scientist who can take direction from the data scientist who can give it, as much as knowledge of the latest tools or newest algorithms.
Some of this is clearly experienceexperience working within an organization, experience solving problems, experience presenting the results. But these are also skills that have been taught before, by many other disciplines. We are not alone in needing them. Just as data scientists did not invent statistics or computer science, we do not need to invent techniques for how to ask good questions or organize complex results. We can draw inspiration from other fields and adapt them to the problems we face. The fields of design, argument studies, critical thinking, national intelligence, problem-solving heuristics, education theory, program evaluation, various parts of the humanitieseach of them have insights that data science can learn from.
Data science is already a field of bricolage. Swaths of engineering, statistics, machine learning, and graphic communication are already fundamental parts of the data science canon. They are necessary, but they are not sufficient. If we look further afield and incorporate ideas from the softer intellectual disciplines, we can make data science successful and help it be more than just this decades fad.
A focus on why rather than how already pervades the work of the best data professionals. The broader principles outlined here may not be new to them, though the specifics likely will be.
This book consists of six chapters. , to give you places to go from here.
Conventions Used in This Book
The following typographical convention is used in this book:
Italic Indicates new terms, URLs, email addresses, filenames, and file extensions.
Safari Books Online
Note
Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the worlds leading authors in technology and business.
Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.
Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like OReilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
OReilly Media, Inc. |
1005 Gravenstein Highway North |
Sebastopol, CA 95472 |
800-998-9938 (in the United States or Canada) |
707-829-0515 (international or local) |
707-829-0104 (fax) |
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/thinking-with-data.
To comment or ask technical questions about this book, send email to .
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments
I would be remiss to not mention some of the fantastic people who have helped make this book possible. Juan-Pablo Velez has been invaluable in refining my ideas. Jon Bruner, Matt Wallaert, Mike Dewar, Brian Eoff, Jake Porway, Sam Rayachoti, Willow Brugh, Chris Wiggins, Claudia Perlich, and John Matthews provided me with key insights that hopefully I have incorporated well.