Despite the continued infiltration of statistics into many realms of life, one thing hasn't changed: telling people I work as a statistician is still the best way to derail a promising conversation at a party. For some reason, this seems to prompt people to tell me about how much they hated the required statistics class they needed for their college major or to prompt them to quote that old chestnut popularized by Mark Twain that there are three kinds of lies: lies, damned lies, and statistics.
Personally, I find statistics fascinating, and I love working in this field. I like teaching statistics as well, and I like to believe that I communicate this enthusiasm to others. Its often an uphill battle, however; many people seem to believe that statistics is no more than a set of tricks and manipulations whose purpose is to twist reality to mislead other people. Others take the opposite view, believing that statistics is a collection of magical procedures that will do their thinking for them.
OK, Just What Is Statistics?
Before you jump into the technical details of learning and using statistics, step back for a minute and consider what can be meant by the word statistics. Dont worry if you dont understand all the vocabulary immediately; it will become clear over the course of reading this book.
When people speak of statistics, they usually mean one or more of the following:
Numerical data, such as the unemployment rate, the number of persons who die annually from bee stings, or the population of New York City in 2006 as compared to 1906.
Numbers used to describe samples of data as opposed to parameters (numbers used to describe populations). For instance, an advertising firm might be interested in the average age of people who subscribe to Sports Illustrated . To answer this question, it could draw a random sample of subscribers, calculate the mean of that sample (a statistic), and use that as an estimate of the mean of the entire population of subscribers (a parameter).
Particular procedures used to analyze data, and the results of those procedures, such as the t statistic or the chi-square statistic.
A field of study that develops and uses mathematical procedures to describe data and make decisions regarding it.
The type of statistics referred to in definition number 1 is not the primary concern of this book. If you simply want to find the latest figures on unemployment, health, or any of the myriad other topics on which governments and other organizations regularly release statistical data, your best bet is to consult a reference librarian or subject matter expert. If, however, you want to know how to interpret those figures (to understand why the mean is often misleading as a statement of average value, for instance, or the difference between crude and standardized mortality rates), Statistics in a Nutshell can definitely help you.
The concepts included in definition number 2 will be discussed in , which introduces inferential statistics, but these concepts also permeate the entire book. It is partly a question of vocabulary ( statistics are numbers that describe samples , whereas parameters are numbers that describe populations ) but underscores a fundamental point about the practice of statistics. The concept of using information gained from studying a sample to make statements about a population is the basis of inferential statistics, and inferential statistics is the primary focus of this book (as it is of most books about statistics).
Definition number 3 is also fundamental to most chapters of this book. The process of learning statistics is to some extent the process of learning particular statistical procedures, including how to calculate and interpret them, how to choose the appropriate statistic for a given situation, and so on. In fact, many new students of statistics subscribe primarily to this definition; learning statistics to them means learning to execute a set of statistical procedures. This is not so much an invalid approach to statistics as it is incomplete; learning to execute statistical procedures is a necessary part of the practice of statistics, but it is far from being the entire story. Whats more, since computer software has made it increasingly easy for anyone, regardless of mathematical background, to produce statistical analyses, the need to understand and interpret statistics has far outstripped the need to learn how to do the calculations themselves.
Definition number 4 is nearest to my heart because I chose statistics as my professional field. If you are a secondary or post-secondary student, you are probably aware of this definition of statistics because many universities and colleges today either have a separate department of statistics or include statistics as a field of specialization within the department of mathematics. Statistics is increasingly taught in high school as well, and in the United States, enrollment in advanced placement (AP) statistics classes is increasing rapidly.
Statistics is not only a specialist subject at the university level. Many university departments require students to take one or more statistics courses alongside subjects in their major. In addition, its worth knowing that many important techniques in modern statistics have been developed by people who learned and used statistics as part of their work in another field. Stephen Raudenbush, a pioneer in the development of hierarchical linear modeling, studied Policy Analysis and Evaluation Research at Harvard, and Edward Tufte, perhaps the worlds leading expert on statistical graphics, began his career as a political scientist: he wrote his PhD dissertation at Yale on the American Civil Rights movement.
Because the use of statistics in many professions and at all levels from management to line workers is increasing, acquiring a basic knowledge of statistics has become a necessity for many people who have been out of school for years. Such individuals are often ill served by textbooks aimed at introductory college courses, which are too specialized, too focused on calculation, and too expensive.
Finally, statistics cannot be left to the statisticians because its also a necessity to take part in modern civic life, in particular to understand much of what you read in the newspaper and hear on the television and radio. A working knowledge of statistics is the best check against the proliferation of misleading or outright false numerical claims (whether by politicians, advertisers, or social reformers), which seem to occupy an ever-increasing portion of our daily news diet. Theres a reason that Darryl Huffs 1954 classic How to Lie with Statistics