Contents
- CHAPTER 1 :
Getting Things in Proportion: Categorical Data and Percentages - CHAPTER 2 :
Summarizing and Communicating Numbers. Lots of Numbers - CHAPTER 3 :
Why Are We Looking at Data Anyway? Populations and Measurement - CHAPTER 4 :
What Causes What? - CHAPTER 5 :
Modelling Relationships Using Regression - CHAPTER 6 :
Algorithms, Analytics and Prediction - CHAPTER 7 :
How Sure Can We Be About What Is Going On? Estimates and Intervals - CHAPTER 8 :
Probability the Language of Uncertainty and Variability - CHAPTER 9 :
Putting Probability and Statistics Together - CHAPTER 10 :
Answering Questions and Claiming Discoveries - CHAPTER 11 :
Learning from Experience the Bayesian Way - CHAPTER 12 :
How Things Go Wrong - CHAPTER 13 :
How We Can Do Statistics Better - CHAPTER 14 :
In Conclusion
About the Author
Sir David John Spiegelhalter is a British statistician and Chair of the Winton Centre for Risk and Evidence Communication in the Statistical Laboratory at the University of Cambridge. Spiegelhalter is one of the most cited and influential researchers in his field, and was elected as President of the Royal Statistical Society for 201718.
To statisticians everywhere, with their endearing traits of pedantry, generosity, integrity, and desire to use data in the best way possible
List of Figures
List of Tables
12.1 Questionable Interpretation and Communication Practices
Acknowledgements
Any insights gained from a long career in statistics come from listening to inspiring colleagues. These are too numerous even for a statistician to count, but a shortlist of those I have stolen most from might include Nicky Best, Sheila Bird, David Cox, Philip Dawid, Stephen Evans, Andrew Gelman, Tim Harford, Kevin McConway, Wayne Oldford, Sylvia Richardson, Hetan Shah, Adrian Smith and Chris Wild. I am grateful to them and so many others for encouraging me in a challenging subject.
This book has been a long time in development, entirely due to my chronic procrastination. So I would primarily like to thank Laura Stickney of Penguin for not only commissioning the book, but remaining calm as the months, and years, went by, even when the book was finished and we still could not agree on a title. And all credit to Jonathan Pegg for negotiating me a fine deal, Jane Birdsell for showing huge patience when editing, and all the production staff at Penguin for their meticulous work.
I am very grateful for permission to adapt illustrations, specifically Chris Wild (). UK public sector information is licensed under the Open Government Licence v3.0.
I am not a good R programmer, and Matthew Pearce and Maria Skoularidou helped me enormously in doing the analyses and graphics. I also struggle with writing, and so am indebted to numerous people who read and commented on chapters, including George Farmer, Alex Freeman, Cameron Brick, Michael Posner, Sander van der Linden and Simone Warr: in particular Julian Gilbey had a fine eye for errors and ambiguity.
Above all, I must thank Kate Bull not only for her vital comments on the text, but also for supporting me through times that have been both good (writing in a beach hut in Goa) and not so good (a wet February juggling too many commitments).
I am also profoundly grateful to David and Claudia Harding for both their financial support and their continued encouragement, which has enabled me to do such fun things over the last ten years.
Finally, much as I would like to find someone else to blame, I am afraid I must acknowledge full responsibility for the inevitable remaining inadequacies of this book.
CODE FOR EXAMPLES
R code and data for reproducing most of the analyses and Figures are available from https://github.com/dspiegel29/ArtofStatistics. I am grateful for the assistance received in preparing this material.
Introduction
The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning.
Nate Silver, The Signal and the Noise
Why We Need Statistics
Harold Shipman was Britains most prolific convicted murderer, though he does not fit the archetypal profile of a serial killer. A mild-mannered family doctor working in a suburb of Manchester, between 1975 and 1998 he injected at least 215 of his mostly elderly patients with a massive opiate overdose. He finally made the mistake of forging the will of one of his victims so as to leave him some money: her daughter was a solicitor, suspicions were aroused, and forensic analysis of his computer showed he had been retrospectively changing patient records to make his victims appear sicker than they really were. He was well known as an enthusiastic early adopter of technology, but he was not tech-savvy enough to realize that every change he made was time-stamped (incidentally, a good example of data revealing hidden meaning).
Of his patients who had not been cremated, fifteen were exhumed and lethal levels of diamorphine, the medical form of heroin, were found in their bodies. Shipman was subsequently tried for fifteen murders in 1999, but chose not to offer any defence and never uttered a word at his trial. He was found guilty and jailed for life, and a public inquiry was set up to determine what crimes he might have committed apart from those for which he had been tried, and whether he could have been caught earlier. I was one of a number of statisticians called to give evidence at the public inquiry, which concluded that he had definitely murdered 215 of his patients, and possibly 45 more.
This book will focus on using to answer the kind of questions that arise when we want to better understand the world some of these questions will be highlighted in a box. In order to get some insight into Shipmans behaviour, a natural first question is:
What kind of people did Harold Shipman murder, and when did they die?
The public inquiry provided details of each victims age, gender and date of death. is a fairly sophisticated visualization of this data, showing a scatter-plot of the age of victim against their date of death, with the shading of the points indicating whether the victim was male or female. Bar-charts have been superimposed on the axes showing the pattern of ages (in 5year bands) and years.
Some conclusions can be drawn by simply taking some time to look at the figure. There are more black than white dots, and so Shipmans victims were mainly women. The bar-chart on the right of the picture shows that most of his victims were in their 70s and 80s, but looking at the scatter of points reveals that although initially they were all elderly, some younger cases crept in as the years went by. The bar-chart at the top clearly shows a gap around 1992 when there were no murders. It turned out that before that time Shipman had been working in a joint practice with other doctors but then, possibly as he felt under suspicion, he left to form a single-handed general practice. After this his activities accelerated, as demonstrated by the top bar-chart.
Figure 0.1
A scatter-plot showing the age and the year of death of Harold Shipmans 215 confirmed victims. Bar-charts have been added on the axes to reveal the pattern of ages and the pattern of years in which he committed murders.
Next page