Cover photo: Gary Carlsen, DDS
Copyright 2012 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Good, Phillip I.
Common errors in statistics (and how to avoid them) / Phillip I. Good, Statcourse.com, Huntington Beach, CA, James W. Hardin, Dept. of Epidemiology & Biostatistics, University of South Carolina, Columbia, SC. Fourth edition.
pages cm
Includes bibliographical references and index.
ISBN 978-1-118-29439-0 (pbk.)
1. Statistics. I. Hardin, James W. (James William) II. Title.
QA276.G586 2012
519.5dc23
2012005888
Preface
ONE OF THE VERY FIRST TIMES DR. GOOD served as a statistical consultant, he was asked to analyze the occurrence rate of leukemia cases in Hiroshima, Japan following World War II. On August 7, 1945 this city was the target site of the first atomic bomb dropped by the United States. Was the high incidence of leukemia cases among survivors the result of exposure to radiation from the atomic bomb? Was there a relationship between the number of leukemia cases and the number of survivors at certain distances from the atomic bombs epicenter?
To assist in the analysis, Dr. Good had an electric (not an electronic) calculator, reams of paper on which to write down intermediate results, and a prepublication copy of Scheffes Analysis of Variance . The work took several months and the results were somewhat inconclusive, mainly because he could never seem to get the same answer twicea consequence of errors in transcription rather than the absence of any actual relationship between radiation and leukemia.
Today, of course, we have high-speed computers and prepackaged statistical routines to perform the necessary calculations. Yet, statistical software will no more make one a statistician than a scalpel will turn one into a neurosurgeon. Allowing these tools to do our thinking is a sure recipe for disaster.
Pressed by management or the need for funding, too many research workers have no choice but to go forward with data analysis despite having insufficient statistical training. Alas, though a semester or two of undergraduate statistics may develop familiarity with the names of some statistical methods, it is not enough to be aware of all the circumstances under which these methods may be applicable.
The purpose of the present text is to provide a mathematically rigorous but readily understandable foundation for statistical procedures. Here are such basic concepts in statistics as null and alternative hypotheses, p-value, significance level, and power. Assisted by reprints from the statistical literature, we reexamine sample selection, linear regression, the analysis of variance, maximum likelihood, Bayes Theorem, meta-analysis and the bootstrap. New to this edition are sections on fraud and on the potential sources of error to be found in epidemiological and case-control studies.
Examples of good and bad statistical methodology are drawn from agronomy, astronomy, bacteriology, chemistry, criminology, data mining, epidemiology, hydrology, immunology, law, medical devices, medicine, neurology, observational studies, oncology, pricing, quality control, seismology, sociology, time series, and toxicology.
More good news: Dr. Goods articles on women sports have appeared in the San Francisco Examiner , Sports Now , and Volleyball Monthly ; 22 short stories of his are in print; and you can find his 21 novels on Amazon and zanybooks.com. So, if you can read the sports page, youll find this text easy to read and to follow. Lest the statisticians among you believe this book is too introductory, we point out the existence of hundreds of citations in statistical literature calling for the comprehensive treatment we have provided. Regardless of past training or current specialization, this book will serve as a useful reference; you will find applications for the information contained herein whether you are a practicing statistician or a well-trained scientist who just happens to apply statistics in the pursuit of other science.
The primary objective of the opening chapter is to describe the main sources of error and provide a preliminary prescription for avoiding them. The hypothesis formulationdata gatheringhypothesis testing and estimationcycle is introduced, and the rationale for gathering additional data before attempting to test after-the-fact hypotheses detailed.
A rewritten Chapter 2 places our work in the context of decision theory. We emphasize the importance of providing an interpretation of each and every potential outcome in advance data collection.
A much expanded Chapter 3 focuses on study design and data collection, as failure at the planning stage can render all further efforts valueless. The work of Berger and his colleagues on selection bias is given particular emphasis.
Chapter 4 on data quality assessment reminds us that just as 95% of research efforts are devoted to data collection, 95% of the time remaining should be spent on ensuring that the data collected warrant analysis.
Desirable features of point and interval estimates are detailed in Chapter 5 along with procedures for deriving estimates in a variety of practical situations. This chapter also serves to debunk several myths surrounding estimation procedures.
Chapter 6 reexamines the assumptions underlying testing hypotheses and presents the correct techniques for analyzing binomial trials, counts, categorical data, continuous measurements, and time-to-event data. We review the impacts of violations of assumptions, and detail the procedures to follow when making two- and k-sample comparisons.
Next page