Statistical manipulation - the Gambler's fallacy
1 Definition : The gambler's fallacy is the mistaken notion that the odds for something with a fixed probability increase or decrease depending upon recent occurrences. This invalid concept is used in all kinds of ways by manipulators both in gambling and investment circles to persuade a victim to "stay at the table" because their "luck is bound to change".
The underlying notion of the gambler's fallacy works as follows: You might think that you can beat the odds by either selecting numbers that have not been chosen in recent draws, or by selecting numbers that have come up more frequently than expected in recent draws.
In either case, you are committing the gambler's fallacy. The fact of the matter is that the odds are always the same, no matter what numbers have been selected in the past and how frequently they were drawn.
This fallacy is commonly committed by gamblers who, for instance, bet on red at roulette when black has come up three times in a row. The odds of black coming up next are the same regardless of what colours have come up in previous turns.
The gambler's fallacy ha s led to some bizarre behaviour like people blowing on the dice before throwing them or even talking to them in an attempt to influence the outcome. Obviously such rituals are based on some kind of misplaced, optimistic belief in the powers of magic, but they don't alter the rules of probability at all .
However, this basic and widespread misunderstanding means that gamblers are an easy target for the professional manipulator in both gambling and commercial circles.
The gambler's fallacy is not confined to gamblers: there is a general tendency to think that future probabilities are altered by past events, when in reality they are unchanged. For instance, an investor may argue, "I have made 10 bad investment decisions already therefore my luck is bound to change soon".
It is a comforting idea, but it is absolute nonsense.
2 Persistence: Long. Potentially gamblers can be induced to continue playing indefinitely.
3 Accessibility: High. Most people will fall for this.
Conditions/Opportunity/Effectiveness : A general misunderstanding of statistical probability makes this a popular and easy- to- use vehicle for the manipulator. The fallacy is also very effective and self-reinforcing: when a victim fails to win again, they simply conclude that "next time they really just have to win" because their luck "just has to turn". If a victim does win, then that simply validates their belief in watching the table for a while before betting. The fallacy is so strongly felt by the victim, that regardless of the facts, everything that happens to the victim just validates their mistaken belief.
Methodology/Refinements/Sub-species : The phenomenon is also known as the " Monte Carlo fallacy", or the "Fallacy of the maturity of chances". There are no known sub-species.
6 Avoidance and Counteraction: The only way to avoid this manipulation is to read and understand a couple of chapters on probability before wasting money on gambling.
All gambling operations are designed to take money from the majority of players in order to pay out to a small number of big winners and of course the gambling operator. Lotteries, including government run lotteries, work on this basis as well. They are really no better than state-legalised pyramid schemes, designed for the benefit of a few big winners, using the total "investment" of the millions of losers.
As Kevin McKenna, the conservative journalist observed: "The most common dream of every "Bullingdon Tory" is the national lottery. And what a jolly wheeze it is: get the poor to fund our biggest capital projects in exchange for a cruel fairy story. (Sic)"
If one does happen to make a small gamble and also win - well then take your winnings, be happy, and never gamble again .
---o0o---
Data d redging
1 Definition : Data dredging involves creating misleading relationships in a dataset. It's the equivalent of looking for an answer (any answer) before having phrased the question. It is a misuse of the techniques of data-mining and statistical analyses, such as regression analysis, with a manipulative intention.
Relationships found by dredging data might appear valid within the test set but they have no statistical signific ance in the wider population. It has become very popular since the advent of very large databases and the use of relational database technology.
We should note that data dredging can sometimes be a valid way of finding a possible hypothesis. But such a hypothesis must then be tested with data not in the original dredged dataset. It is misused when a hypothesis is stated as a fact without further validation and is only tested using data that actually originated the hypothesis in the first place .
Data dredging occurs when researchers browse data looking for relationships rather than forming a hypothesis before looking at the data. Another example is when subsets of data are deliberately chosen to create the illusion of significant patterns in deliberately narrowed down data sets.
In data dredging, large compilations of data are examined to find a correlation, without any pre-defined choice of a hypothesis to be tested. Since the required confidence interval to establish a relationship between two parameters is usually chosen to be 95% (meaning that there is a 95% chance that the relationship observed is not due to random chance), there is a thus a 5% chance of finding a correlation between any two sets of completely random variables.
Because data dredging exercises typically examine large datasets with many variables, it is almost certain that apparently statistically significant results will be found somewhere in the data, even though they are entirely spurious and coincidental.
This technique can be used in any field, but it is most often used in medical and other scientific research and the financial environment where interested parties fish the data for apparently interesting correlations and relationships.
For example, suppose that observers note that a particular town appears to have a cluster of cancers in their area , but the observers lack a firm hypothesis as to why this is . The researchers have access to a large amount of demographic data about the town and area, containing measurements for the area of hundreds of different mostly uncorrelated variables. Even if all these variables are independent of the cancer incidence rate, it is highly likely that at least one variable correlates significantly with the cancer rate. Whilst this may suggest a hypothesis, further testing using the same variables but with data from different locations is needed to confirm the hypothesis.
1.1 Traditional scientific methodology: For the lay reader it is important to understand the methodology used in conventional scientific research and how this provides safeguards against manipulative interference.
Conventional scientific method calls for a researcher to formulate a hypothesis, collect relevant data, use some method of statistical analysis to establish some form of correlation and then carry out a statistical significance test to see whether the results could be due to the effects of chance (the so-called null hypothesis). The results are then compared to the hypothesis to prove or disprove its truth.
A vital issue in proper statistical analysis is to test a hypothesis with data that was not used in constructing the hypothesis. This is central to the integrity of a scientific process , because every data set contains some patterns which are due entirely to chance.
If a hypothesis is not tested with a different dataset from the original study population then it is impossible to determine if the patterns found are chance patterns or whether they have some real significance. If we toss a coin 11 times and get heads 5 times and tails 6 times we could conclude a hypothesis that the coin favours tails between 6/11 - 7/11. However, testing this theory on the same data set will only confirm the theory and such confirmation will have no meaning. The statistical significance of the theory needs to be tested on a completely new , fresh dataset, using a new set of coin tossing results.