Alex Reinhart is a statistics instructor and PhD student at Carnegie Mellon University. He received his BS in physics at the University of Texas at Austin and does research on locating radioactive devices using physics and statistics.
Preface
A few years ago I was an undergraduate physics major at the University of Texas at Austin. I was in a seminar course, trying to choose a topic for the 25-minute presentation all students were required to give. Something about conspiracy theories, I told Dr. Brent Iverson, but he wasnt satisfied with that answer. It was too broad, he said, and an engaging presentation needs to be focused and detailed. I studied the sheet of suggested topics in front of me. How about scientific fraud and abuse? he asked, and I agreed.
In retrospect, Im not sure how scientific fraud and abuse is a narrower subject than conspiracy theories, but it didnt matter. After several slightly obsessive hours of research, I realized that scientific fraud isnt terribly interestingat least, not compared to all the errors scientists commit unintentionally.
Woefully underqualified to discuss statistics, I nonetheless dug up several dozen research papers reporting on the numerous statistical errors routinely committed by scientists, read and outlined them, and devised a presentation that satisfied Dr. Iverson. I decided that as a future scientist (and now a self-designated statistical pundit), I should take a course in statistics.
Two years and two statistics courses later, I enrolled as a graduate student in statistics at Carnegie Mellon University. I still take obsessive pleasure in finding ways to do statistics wrong.
Statistics Done Wrong is a guide to the more egregious statistical fallacies regularly committed in the name of science. Because many scientists receive no formal statistical trainingand because I do not want to limit my audience to the statistically initiatedthis book assumes no formal statistical training. Some readers may easily skip through the first chapter, but I suggest at least skimming it to become familiar with my explanatory style.
My goal is not just to teach you the names of common errors and provide examples to laugh at. As much as is possible without detailed mathematics, Ive explained why the statistical errors are errors, and Ive included surveys showing how common most of these errors are. This makes for harder reading, but I think the depth is worth it. A firm understanding of basic statistics is essential for everyone in science.
For those who perform statistical analyses for their day jobs, there are at the end of most chapters to explain what statistical techniques you might use to avoid common pitfalls. But this is not a textbook, so I will not teach you how to use these techniques in any technical detail. I hope only to make you aware of the most common problems so you are able to pick the statistical technique best suited to your question.
In case I pique your curiosity about a topic, a comprehensive bibliography is included, and every statistical misconception is accompanied by references. I omitted a great deal of mathematics in this guide in favor of conceptual understanding, but if you prefer a more rigorous treatment, I encourage you to read the original papers.
I must caution you before you read this book. Whenever we understand something that few others do, it is tempting to find every opportunity to prove it. Should Statistics Done Wrong miraculously become a New York Times best seller, I expect to see what Paul Graham calls middlebrow dismissals in response to any science news in the popular press. Rather than taking the time to understand the interesting parts of scientific research, armchair statisticians snipe at news articles, using the vague description of the study regurgitated from some overenthusiastic university press release to criticize the statistical design of the research.[]
This already happens on most websites that discuss science news, and it would annoy me endlessly to see this book used to justify it. The first comments on a news article are always complaints about how they didnt control for this variable and the sample size is too small, and 9 times out of 10, the commenter never read the scientific paper to notice that their complaint was addressed in the third paragraph.
This is stupid. A little knowledge of statistics is not an excuse to reject all of modern science. A research papers statistical methods can be judged only in detail and in context with the rest of its methods: study design, measurement techniques, cost constraints, and goals. Use your statistical knowledge to better understand the strengths, limitations, and potential biases of research, not to shoot down any paper that seems to misuse a p value or contradict your personal beliefs. Also, remember that a conclusion supported by poor statistics can still be correctstatistical and logical errors do not make a conclusion wrong, but merely unsupported.
In short, please practice statistics responsibly. I hope youll join me in a quest to improve the science we all rely on.
Acknowledgments
Thanks to James Scott, whose statistics courses started my statistical career and gave me the background necessary to write this book; to Raye Allen, who made Jamess homework assignments much more fun; to Matthew Watson and Moriel Schottlender, who gave invaluable feedback and suggestions on my drafts; to my parents, who gave suggestions and feedback; to Dr. Brent Iverson, whose seminar first motivated me to learn about statistical abuse; and to all the scientists and statisticians who have broken the rules and given me a reason to write.
My friends at Carnegie Mellon contributed many ideas and answered many questions, always patiently listening as I tried to explain some new statistical error. My professors, particularly Jing Lei, Valrie Ventura, and Howard Seltman, prepared me with the necessary knowledge. As technical reviewer, Howard caught several embarrassing errors; if any remain, theyre my responsibility, though I will claim theyre merely in keeping with the title of the book.
My editors at No Starch dramatically improved the manuscript. Greg Poulos carefully read the early chapters and wasnt satisfied until he understood every concept. Leslie Shen polished my polemic in the final chapters, and the entire team made the process surprisingly easy.
I also owe thanks to the many people who emailed me suggestions and comments when the guide became available online. In no particular order, I thank Axel Boldt, Eric Franzosa, Robert OShea, Uri Bram, Dean Rowan, Jesse Weinstein, Peter Hozk, Chris Thorp, David Lovell, Harvey Chapman, Nathaniel Graham, Shaun Gallagher, Sara Alspaugh, Jordan Marsh, Nathan Gouwens, Arjen Noordzij, Kevin Pinto, Elizabeth Page-Gould, and David Merfield. Without their comments, my explanations would no doubt be less complete.
Perhaps you can join this list. Ive tried my best, but this guide will inevitably contain errors and omissions. If you spot an error, have a question, or know a common fallacy Ive missed, email me at .
Introduction
In the final chapter of his famous book How to Lie with Statistics