Praise for Samantha Kleinbergs Why
While cutting-edge computing tools make it easy to find patterns in data, the best insights come from understanding where those patterns come from, and this problem cant be solved by computers alone. Kleinberg expertly guides readers on a tour of the key concepts and methods for identifying causal relationships, with a clear and practical approach that makes Why unlike any other book on the subject. Accessible yet comprehensive, Why is essential reading for scientific novices, seasoned experts, and anyone else looking to learn more from data.
Andrew Therriault, PhD, Director of Data Science, Democratic National Committee
Philosophy, economics, statistics, and logic all try to make sense of causality; Kleinberg manages to tie together these disparate approaches in a way thats straightforward and practical. As more of our lives become data driven, clear thinking about inferring causality from observations will be needed for understanding policy, health, and the world around us.
Chris Wiggins, PhD,
Chief Data Scientist at the New York Times
and Associate Professor at Columbia University
While causality is a central feature of our lives, there is widespread debate and misunderstanding about it. Why lucidly explains causality without relying on prior knowledge or technical expertise. It is an accessible and enjoyable read, yet it gives logical rigor and depth of analysis to complex concepts.
David Lagnado, PhD,
Senior Lecturer, University College London
Why: A Guide to Finding and Using Causes
by Samantha Kleinberg
Copyright 2016 Samantha Kleinberg. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Acquisitions Editor: Mike Loukides
- Editor: Marie Beaugureau
- Production Editor: Matthew Hacker
- Copyeditor: Phil Dangler
- Proofreader: Rachel Head
- Indexer: Judith McConville
- Interior Designer: David Futato
- Cover Designer: Anton Khodakovsky
- Illustrator: Samantha Kleinberg
- December 2015: First Edition
Revision History for the First Edition
See http://oreilly.com/catalog/errata.csp?isbn=9781491949641 for release details.
While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-4919-4964-1
[LSI]
Preface
Will drinking coffee help you live longer? Who gave you the flu? What makes a stocks price increase? Whether youre making dietary decisions, blaming someone for ruining your weekend, or choosing investments, you constantly need to understand why things happen. Causal knowledge is what helps us predict the future, explain the past, and intervene to effect change. Knowing that exposure to someone with the flu leads to illness in a particular period of time tells you when youll experience symptoms. Understanding that highly targeted solicitations can lead to political campaign donations allows you to pinpoint these as a likely cause of improvements in fundraising. Realizing that intense exercise causes hyperglycemia helps people with diabetes manage their blood glucose.
Despite this skill being so essential, its unlikely you ever took a class on how to infer causes. In fact, you may never have stopped to think about what makes something a cause. While theres much more to the story, causes basically increase the chances of an event occurring, are needed to produce an effect, or are strategies for making something happen. Yet, just because a medication can cause heart attacks does not mean that it must be responsible for a particular individuals heart attack, and just because reducing class sizes improved student outcomes in one place does not mean the same intervention will always work in other areas. This book focuses on not just what inferences are possible when everything goes right, but shows why seeming successes can be hard to replicate. We also examine practical questions that are often ignored in theoretical discussions.
There are many ways of thinking about causality (some complementary, some competing), and it touches on many fields (philosophy, computer science, psychology, economics, and medicine, among others). Without taking sides in these debates, I aim to present a wide range of views, making it clear where consensus exists and where it does not. Among other topics, we will explore the psychology of causality (how do people learn about causes?), how experiments to establish causality are conducted (and what are their limits?), and how to develop policies from causal knowledge (should we reduce sodium in food to prevent hypertension?).
We start with what causes are and why we are often wrong when we think we have found them (Chapters ).
Large datasets make it possible to discover causes, rather than simply testing our hypotheses, but it is important to realize that not all data are suitable for causal inference. In ). This book will give you an appreciation for why finding causality is difficult (and more nuanced and complex than news articles may lead you to believe) and why, even though it is hard, it is an important and widely applicable problem.
Though there are challenges, you will also see that its not hopeless. Youll develop a set of tools for thinking causally: questions to ask, red flags that should arouse suspicion, and ways of supporting causal claims. In addition to identifying causes, this book will also help you use them to make decisions based on causal information, enact policies, and verify the causes through further tests.
This book does not assume any background knowledge and is written for a general audience. I assume only a curiosity about causes, and aim to make the complex landscape of causality widely accessible. To that end, well focus more on intuitions and how to understand causality conceptually than mathematical details (actually, there wont be any mathematical details). If you have a PhD in computer science or statistics you may pick up some new tools and enjoy the tour of work in other fields, but you may also yearn for more methodological detail. Our focus here, though, is causality for everyone.
Chapter 1. Beginnings
Where do our concepts of causality and methods for finding it come from ?
In 1999, a British solicitor named Sally Clark was convicted of murdering her two children. A few years earlier, in December 1996, her first son died suddenly at 11 weeks of age. At the time this was ruled as a death by natural causes, but just over a year after the first childs death, Clarks second son died at 8 weeks of age. In both cases the children seemed otherwise healthy, so their sudden deaths raised suspicions.