Contents in Detail
BAYESIAN STATISTICS THE FUN WAY
Understanding Statistics and Probability with Star Wars, LEGO, and Rubber Ducks
by Will Kurt
San Francisco
BAYESIAN STATISTICS THE FUN WAY. Copyright 2019 by Will Kurt.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.
ISBN-10: 1-59327-956-6
ISBN-13: 978-1-59327-956-1
Publisher: William Pollock
Production Editor: Laurel Chun
Cover Illustration: Josh Ellingson
Interior Design: Octopod Studios
Developmental Editor: Liz Chadwick
Technical Reviewer: Chelsea Parlett-Pelleriti
Copyeditor: Rachel Monaghan
Compositor: Danielle Foster
Proofreader: James Fraleigh
Indexer: Erica Orloff
For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 1.415.863.9900;
www.nostarch.com
A catalog record of this book is available from the Library of Congress
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an As Is basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
To Melanie, who reawoke in me a passion for words
About the Author
Will Kurt currently works as a data scientist at Wayfair, and has been using Bayesian statistics to solve real business problems for over half a decade. He frequently blogs about probability on his website, CountBayesie.com. Kurt is the author of Get Programming with Haskell (Manning Publications) and lives in Boston, Massachusetts.
About the Technical Reviewer
Chelsea Parlett-Pelleriti is a PhD student in Computational and Data Science, and has a long-standing love of all things lighthearted and statistical. She is also a freelance statistics writer, contributing to projects including the YouTube series Crash Course Statistics and The Princeton Reviews Cracking the AP Statistics Exam. She currently lives in Southern California.
BRIEF CONTENTS
ACKNOWLEDGMENTS
Writing a book is really an incredible effort that involves the hard work of many people. Even with all the names following I can only touch on some of the many people that have made this book possible. I would like to start by thanking my son, Archer, for always keeping me curious and inspiring me.
The books published by No Starch have long been my some of my favorite books to read and it is a real honor to get to work with the amazing team there to produce this book. I give tremendous thanks to my editors, reviewers, and the incredible team at No Starch. Liz Chadwick originally approached me about creating this book and provided excellent editiorial feedback and guidence through the entire porcess of this book. Laurel Chun made sure the entire process of going from some messy R notebooks to a full fledged book went incredibly smoothly. Chelsea Parlett-Pelleriti went well beyond the requirements of a technical reviewer and really helped to make this book the best it can be. Frances Saux added many insightful comments to the later chapters of the book. And of course thank you to Bill Pollock for creating such a delightful publishing company.
As an English literature major in undergrad I never could have imagined writing a book on any mathematical subject. There are a few people who were really essential to helping me see the wonder of mathematics. I will forever be grateful to my college roommate, Greg Muller, who showed a crazy English major just how exciting and interesting the world of mathematics can be. Professor Anatoly Temkin at Boston University opened the doors to mathematical thinking for me by teaching me to always answer the question, what does this mean? And of course a huge thanks to Richard Kelley who, when I found myself in the desert for many years, provided an oasis of mathematical conversations and guidence. I would also like to give a shoutout to the data science team at Bombora, especially Patrick Kelley, who provided so many wonderful questions and coversations, some of which found their way into this book. I will also be forever grateful to the readers of my blog, Count Bayesie, who have always provided wonderful questions and insights. Among these readers, I would especially like to thank the commentor Nevin who helped correct some early misunderstandings I had.
Finally I want to give thanks to some truly great authors in Bayesian statistics whose books have done a great deal to guide my own growth in the subject. John Kruschkes Doing Bayesian Data Analysis and Bayesian Data Analysis by Andrew Gelman, et al. are great books everyone should read. By far the most influential book on my own thinking is E.T. Jaynes phenomenal Probability Theory: The Logic of Science, and Id like to add thanks to Aubrey Clayton for making a series of lectures on this challenging book which really helped clarify it for me.
INTRODUCTION
Virtually everything in life is, to some extent, uncertain. This may seem like a bit of an exaggeration, but to see the truth of it you can try a quick experiment. At the start of the day, write down something you think will happen in the next half-hour, hour, three hours, and six hours. Then see how many of these things happen exactly like you imagined. Youll quickly realize that your day is full of uncertainties. Even something as predictable as I will brush my teeth or Ill have a cup of coffee may not, for some reason or another, happen as you expect.
For most of the uncertainties in life, were able to get by quite well by planning our day. For example, even though traffic might make your morning commute longer than usual, you can make a pretty good estimate about what time you need to leave home in order to get to work on time. If you have a super-important morning meeting, you might leave earlier to allow for delays. We all have an innate sense of how to deal with uncertain situations and reason about uncertainty. When you think this way, youre starting to think probabilistically.
Why Learn Statistics?
The subject of this book, Bayesian statistics, helps us get better at reasoning about uncertainty, just as studying logic in school helps us to see the errors in everyday logical thinking. Given that virtually everyone deals with uncertainty in their daily life, as we just discussed, this makes the audience for this book pretty wide. Data scientists and researchers already using statistics will benefit from a deeper understanding and intuition for how these tools work. Engineers and programmers will learn a lot about how they can better quantify decisions they have to make (Ive even used Bayesian analysis to identify causes of software bugs!). Marketers and salespeople can apply the ideas in this book when running A/B tests, trying to understand their audience, and better assessing the value of opportunities. Anyone making high-level decisions should have at least a basic sense of probability so they can make quick back-of-the-envelope estimates about the costs and benefits of uncertain decisions. I wanted this book to be something a CEO could study on a flight and develop a solid enough foundation by the time they land to better assess choices that involve probabilities and uncertainty.