Table of Contents
BERNOULLIS
FALLACY
BERNOULLIS
FALLACY
Statistical Illogic
and the Crisis of
Modern Science
AUBREY CLAYTON
COLUMBIA UNIVERSITY PRESS
NEW YORK
Columbia University Press gratefully acknowledges the generous support for this book provided by a member of the Publishers Circle.
Columbia University Press
Publishers Since 1893
New YorkChichester, West Sussex
cup.columbia.edu
Copyright 2021 Aubrey Clayton
All rights reserved
EISBN 978-0-231-55335-3
Library of Congress Cataloging-in-Publication Data
Names: Clayton, Aubrey, author.
Title: Bernoulli's fallacy : statistical illogic and the crisis of modern
science / Aubrey Clayton.
Description: New York : Columbia University Press, 2021. | Includes
bibliographical references and index.
Identifiers: LCCN 2021004250 (print) | LCCN 2021004251 (ebook) | ISBN
9780231199940 (hardback) | ISBN 9780231553353 (ebook)
Subjects: LCSH: Bernoulli, Jakob, 16541705Influence. |
ProbabilitiesPhilosophy19th century. |
ProbabilitiesPhilosophy20th century. | Mathematical
statisticsPhilosophy. | Binomial distribution. | Law of large numbers.
Classification: LCC QA273.A35 C53 2021 (print) | LCC QA273.A35 (ebook) |
DDC 519.2dc23
LC record available at https://lccn.loc.gov/2021004250
LC ebook record available at https://lccn.loc.gov/2021004251
A Columbia University Press E-book.
CUP would be pleased to hear about your reading experience with this e-book at .
Cover design: Noah Arlow
Disclaimer: The views expressed are solely those of the author and do not reflect the views of
his employer, Moody's Analytics, or its parent company, Moody's Corporation, or its affiliates.
Dedicated to Jameel Al-Aidroos, of blessed memory.
CONTENTS
S ince this book risks being accused of relitigating old arguments about statistics and science, let us first dispense with the idea that those arguments were ever settled. The statistics wars never ended; in some ways they have only just begun.
Science, statistics, and philosophy need each other now as much as ever, especially in the context of the still unfolding crisis of replication. Everyone regardless of ideology can likely agree that something is wrong with the practice of statistics in science. Now is also the right time for a frank conversation because statistical language is increasingly a part of our daily communal lives. The COVID-19 pandemic has, sadly, forced statistical terms like test sensitivity, specificity, and positive predictive value into our collective lexicon. Meanwhile, in other recent examples, (spurious) statistical arguments were a core component of the allegations of electoral fraud in the 2020 U.S. presidential elections, and (non-spurious) statistical arguments are central to the allegations of systemic racial bias in the U.S. criminal justice system. The largest stories of our timein public health, education, government, civil rights, the environment, business, and many other domainsare being told using the rhetorical devices of statistics. So the recognition that statistical rhetoric might lend itself to misuse makes this an urgent problem with an ethical dimension. On that we can probably also agree.
What to do about it is another matter. In science, several proposed methodological changes (discussed in the following) have gained support as potential solutions to the replication crisis, but there are no clear winners yet. The reason consensus is hard to come by is that there are unresolved foundational questions of statistics lurking within these debates about methods. The discussions happening now can, in fact, be seen as a vibrant remixing of the same philosophical issues that have colored the controversies about statistics since the 1800s. In short, assessing whether a proposed change successfully fixes a problem requires one to first decide what the problems are, and such decisions reveal philosophical commitments about the process by which scientific knowledge is created. When it comes to such foundational questions, we are not all on the same page, for reasons explored in this book.
Because statistical methods are a means of accounting for the epistemic role of measurement error and uncertainty, the statistics wars (at least on the frequentist versus Bayesian front) are best described as a dispute about the nature and origins of probability: whether it comes from outside us in the form of uncontrollable random noise in observations, or inside us as our uncertainty given limited information on the state of the world. The first perspective limits the scope of probability to those kinds of chance fluctuations we can, in principle, tabulate empirically; the second one allows for probability to reflect a degree of confidence in a hypothesis, both before and after some new observations are considered. Unfortunately for the conflict-averse, there is no neutral position here.
As a snapshot of the ways these philosophical commitments are now playing themselves out in practice, consider that much of the current debate about statistical and scientific methods can be organized into three categories of concerns:
Where does the hypothesis come from and when? If a particular hypothesis, representing a concrete prediction of the ways a research theory will be borne out in some measured variables, is crafted after peeking at the results or going on a fishing expedition to find a version that best suits the available data, then it may be considered a suspicious product of post hoc theorizing, also known as hypothesizing after results are known (HARKing), taking advantage of researcher degrees of freedom, the Texas sharpshooter fallacy, data dredging, the look-elsewhere effect, or p-hacking. Various proposals to combat this include the pre-registration of methods, that is, committing to a certain rigid process of interpreting the data before it has been gathered, sequestering the exploratory phase of research from the confirmatory one, or correcting for multiple possible comparisons, as in the Bonferroni correction (dividing the threshold for significance by the number of simultaneous hypotheses being considered) or others like it.
What caused the experiment to begin and end, and how did we come to learn about it? Subcategories of this concern include the problem of publication bias, or the file drawer problem, and the problem of optional stopping. If, say, an experimenter conducting a trial is allowed to keep running the experiment and collecting data until a favorable result is obtained and only then report that result, there is apparently the potential for malfeasance. Attempts to block this kind of behavior include making publication decisions solely on the basis of the pre-registered reportsthat is, based purely on the methodsto encourage publishing negative results, and for the stopping rule to be explicitly specified ahead of time and adhered to.