Birger Madsen Statistics for Non-Statisticians 10.1007/978-3-642-17656-2_1 Springer-Verlag Berlin Heidelberg 2011
1. Data Collection
This chapter explains some basic concepts within statistics. Also, we look at the most important ways to collect data in surveys.
Dating back to ancient times people have needed knowledge about population size, to carry out a census of the armies or calculate expected taxes. The word statistics is derived from the word status (originally coming from Latin); and it was exactly the status of society, which was the subject of the first statistics! Later emerged probability theory (in connection with games!), demographics and insurance science as areas, in which statistical thinking was essential.
In todays digital age it is easy to collect as well as process and disseminate data, and therefore statistics is used for a variety of surveys throughout society.
Descriptive statistics means describing data using tables , charts and simple statistical calculations such as averages, percentages, etc. This is what many people understand by the word statistics. It was also the kind of statistics that was produced in ancient times.
Analytical statistics is used to assess differences and relationships in data. For example, we could examine whether there is a relation between height and weight of a group of persons; or whether there is a difference between height of boys and height of girls, as well as provide an estimate of how large this difference is. Analytical statistics is a mathematical discipline, based on calculus of probability. It is a relatively new discipline that has been developed throughout the twentieth century.
This book is about descriptive statistics as well as analytical statistics. In practice, you need both. Analytical statistics is a very large topic, and here we can only scratch the surface (see especially Chaps.5, 7, and 8). If you want to know more about analytical statistics, see some of the more advanced books in the literature list.
1.1
Statistics can be defined as a collection of techniques used when planning a data collection, and when subsequently analyzing and presenting data.
1.2
Most statistical surveys can be divided into the following phases:
Clarification of concepts
Planning of data collection
Data collection
Analysis and presentation of data
Conclusion
Statistical methods (and statisticians!) are particularly useful in phases 2 and 4 of the survey.
1.3
There are two kinds of statistics:
Descriptive statistics
Analytical statistics
1.4 Sample Surveys
We are interested in the entire population of individuals. The advantage of investigating only a sample is that it is both faster and cheaper than investigating the whole population. In some situations, a carefully planned sample survey can even give more accurate results than a badly planned total survey!
We investigate the individuals in the sample in order to study the whole population! This means that the sample gives us an estimate (*) of the characteristics of the population (Fig. ).
Fig. 1.1
Sample and population
Examples of characteristics:
Average (*) of a measurable attribute of individuals, e.g., height
Percentage of individuals who belong to a particular category (e.g., who have a specific hobby)
The larger the sample, the better an estimate of the population!
It is also important that the sample is representative of the population. In practice, this means that the individuals in the sample are selected at random , in order to cover the whole population. We are dealing much more with sampling (*) in Chap.6.
The sample (and the population) may consist of different types of individuals, depending on the context.
Some examples:
People
Companies
Public institutions
Families
Vouchers
Houses
Cars
Trees
Dogs
Bacterial colonies
Bottles or cans of beer
Pills
The concepts in this book can be applied to all types of samples . The examples are mainly samples consisting of people. But the principles can be applied to all types of samples.
Typical applications with samples consisting of people are: analysis of attitudes, consumption, durable goods, interests and hobbies, eating and drinking habits, transportation, traffic, vacation, media (TV, radio, newspapers) and certain sensitive topics, such as alcohol consumption.
Sample surveys can provide a high degree of flexibility : One can in the same survey have questions on media consumption, traffic patterns and attitudes. Sample surveys are also widely used in commercial surveys, e.g., in connection with telephone interviews.
One of the essential applications of sampling is sampling inspection in the field of statistical quality control . If you are working with statistical quality control, most of this book will be relevant to you. The issues that are specific to this discipline, however, will not be dealt with here. See the literature list for books on statistical quality control.
1.4.1
In any survey (*), we collect information on the individuals of either the entire population (*) (a total survey ) or a relatively small number of individuals of a sample (*) (a sample survey ), in order to analyze and present data.
1.5 Fitness Club: Example of a Sample Survey
This example is fictitious survey. It will be used as an example for the subsequent chapters.
Fitness Club has a number of sports facilities. This includes facilities for strength training, weight loss and cardiovascular workout.
Fitness Club wants to understand the needs of their young customers, kids of age 1217 years. The club wants to know, how satisfied these kids are with the sports facilities. They also want to obtain information about their health in order to better customize the sports facilities for the various types of training.
Therefore, a sample survey is carried out among the kids using the sports facilities. We will later discuss how the survey can be organized. Moreover, we present some findings from the survey.
The population consists of kids using the sports facilities in Fitness Club. The individual is one kid. The sample consists of 30 kids.
Some data related to this example are found in the Appendices in Chap.9 of this book.
1.6 Experiments
In certain situations, the information needed simply does not exist at all! In this case, one can plan (or design) an experiment (*), with the aim to provide the relevant data. In the experiment, we test the influence of one or more factors on a measured result. This approach is widely used in technical and industrial contexts.