Probability and Intuition
Sujith Vijay
2020 Sujith Vijay
A ll rights reserved.
C over image: The 1821 Derby at Epsom by Theodore Gericault
C ONTENTS
C hap t er 1: Correlation and Causation
Chapter 2: Regression to the Mean
Chapter 3: The Battle of Inferences
C hapter 4: Skewness and Kurtosis
Chapter : Overfitting
C hapter 6: Precision and Recall
C hapter 7: The Kelly Criterion
Chapter 8: The St. Petersburg Paradox
C hapter 9: Quantum Mechanics
C hapter 10: The Monty Hall Problem
Chapter 1: Correlation and Causation
In 2005, the Kansas School Board of Education received an open letter from a young science graduate named Bobby Henderson. The letter urged the authorities to consider teaching, in addition to the Christian creationist view recommended by the board as an alternative to Darwins theory of evolution, other theories of intelligent design as well. Specifically, Henderson put forth the tenets of a new religion called Pastafarianism , which expounded that the universe and everything in it was created by an omnipotent entity known as the Flying Spaghetti Monster. The usual prescriptions and proscriptions that accompany every other religion were present in Pastafarianism as well for example, it was forbidden to teach the beliefs of the religion without wearing His chosen outfit, namely full pirate regalia. Among the many memorable asides in Hendersons hilarious letter was the following observation:
You may be interested to know that global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s. For your interest, I have included a graph of the approximate number of pirates versus the average global temperature over the last 200 years. As you can see, there is a statistically significant inverse relationship between pirates and global temperature.
While the graph is omitted here, the reader can easily see the (intentional) flaw in this sort of reasoning. Just because there is a direct or inverse relation between two variables, it does not mean that the first one caused the second. It could well be that the two events have no causal relation whatsoever, or the second one actually caused the first, or the two events have a common cause (in the pirates example, the second order effects of the industrial revolution). The usual mantra that encapsulates this idea, drilled in a first year college level course in statistics is: Correlation does not imply causation.
However, causation (or causality, to use the more common term) is notoriously difficult to establish. It is a metaphysical conundrum that has vexed great philosophers like David Hume and Immanuel Kant, and remains a feral animal to this day. Hume believed that causal inference was contingent on experience, and any causality established based on a sequence of observations was, at best, predicated on the hope that the future would continue to resemble the past. Causality, according to Hume, was imputed by the observer based on the constant conjunction of events, and was really a leap of faith. Kant, on the other hand, postulated that causality existed independent of experience, as what he called a synthetic a priori judgement . Causality, according to Kant, was an additional feature of the same cognitive apparatus that allowed an observer to infer which of the two events occurred before the other. Centuries later, the jury is still out on whether Kant resolved Humes objection, or even just what Kant intended in the first place.
A contemporary philosopher and computer scientist who has done highly influential work on causality is Judea Pearl, director of the Cognitive Systems Laboratory at the University of California, Los Angeles. Pearls working definition of causality is that it would predict the consequences of intervention in a system. A causes B if and only if we can modulate B by altering A. Ideally, A and B are parametrized by continuous variables, and one can check if changes in the purported cause lead to commensurate changes in the observed effect. Thus not only does gravity cause a ripe apple to fall to the earths surface, we can even predict its rate of change of velocity in terms of the mass and radius of the earth. (An experimenter may not be able to do much about the mass of the earth, but the effective radius does depend on latitude.) The probabilistic analogues of such questions are far more intricate, but Pearl, building upon prior work by Peter Spirtes, Clark Glymour and Richard Scheines, has developed a causal calculus using combinatorial objects called directed acyclic graphs to deal with them. This is the state of the art when it comes to answering instances of the question, Is this just correlation or can it be upgraded to causation?
It is now accepted scientific wisdom that smoking causes lung cancer, but this was a hard-won victory. It was bad enough that the tobacco lobby fought tooth and nail against the advancement of knowledge by funding study after study to reshape consensus. But they also found an unlikely ally in Sir Ronald Fisher, arguably the greatest statistician in history. Fisher was an unapologetic pipe smoker and steadfastly opposed the conclusions of the landmark study of Richard Doll and Austin Hill that made a highly persuasive argument for the causal connection between smoking and lung cancer. In a letter to the British Medical Journal in 1957, Fisher conceded that there was a prima facie case for further investigation, but refused to accept the question as settled. He had high standards for causation, and apparently the study had not met them.
As Fisher died a few years later, it is unclear if he would have changed his mind as the mountain of evidence grew in the subsequent decades. In any case, Pearls causal calculus now provides a framework to address these and other questions. The moral of the story is that one should not be too disdainful or dismissive of correlation. As the American statistician Edward Tufte observed, Correlation is not causation but it sure is a hint.
Chapter 2: Regression to the Mean
In 1885, British statistician and anthropologist Sir Francis Galton published the results of a pioneering study that compared the heights of 898 adult children with that of their parents. The purpose of the study was to investigate the heritability of height. Indeed, taller parents did have taller children on average, and shorter parents had shorter children on average. Thus height was found to be a heritable trait within the confidence level offered by the sample size. But the study is remembered today for the surprising auxiliary finding that the children of taller parents were typically shorter than their parents, while the children of shorter parents were typically taller than their parents. In other words, the data did not support the thesis that the progeny of outliers would continue to be outliers; it actually indicated just the opposite. Galton called this phenomenon regression to mediocrity, and these days it is called regression to the mean .
In some sense, Galtons finding is not surprising at all, as a lower bar of parental height is easier to cross than a higher bar. However, given that height is heritable, the lower bar could well be as difficult for the child of shorter parents as the higher bar is for the child of taller parents. There are two opposite effects at play here, and only a statistical study can bring out their respective strengths. (For simplicity, we are ignoring situations where one parent is relatively tall and the other relatively short, though of course Galtons data set had plenty of such cases.) It is worthwhile to mention that regression to the mean should not be confused with the gamblers fallacy , which erroneously presumes that a fair coin is more likely to show tails if, say, the previous three tosses showed heads.
Next page