THE SIGNIFICANCE TEST CONTROVERSY
THE SIGNIFICANCE TEST CONTROVERSY
edited by
Denton E. Morrison and Ramon E. Henkel
Second paperback printing 2009
Copyright 1970 by Transaction Publishers, New Brunswick, N.J.
All rights reserved under International and Pan-American Copyright Conventions. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without prior permission in writing from the publisher. All inquiries should be addressed to AldineTransaction, A Division of Transaction Publishers, RutgersThe State University, 35 Berrue Circle, Piscataway, New Jersey 08854-8042. www.transactionpub.com
This book is printed on acid-free paper that meets the American National Standard for Permanence of Paper for Printed Library Materials.
Library of Congress Catalog Number: 2006042891
ISBN: 0-202-30879-0
Printed in the United States of America
Library of Congress Cataloging-in-Publication Data
The significance test controversy : a reader / Denton E. Morrison and Ramon
E. Henkel, editors.
p. cm.
Originally published: Chicago : Aldine Pub. Co., 1970.
ISBN 0-202-30879-0 (alk. paper)
1. Statistical hypothesis testing. 2. Social sciencesStatistical methods.
I. Morrison, Denton E. II. Henkel, Ramon E., 1931
HA33.M67 2006
519.5'6dc22
2006042891
Preface
O ur interest in compiling this volume stems from our concern with the considerable amount of indiscriminate use of significance tests in behavioral research. Even their strongest proponents agree that there is much misuse, misinterpretation, and meaningless use of the tests. More important than the question of how the tests are correctly used, however, is the question of whether the tests are useful, and why or why not. We are concerned that many users lack an understanding of the latter questions. We do not know how much of this lack is because an extensive literature critical of past and current practice in the use of significance tests is unknown to researchers and how much results from a failure to heed the criticism. But we hope that collecting a substantial portion of this literature in one volume will help make researchers more mindful of both the practical problems and philosophical pitfalls involved in using the tests.
While the essential tone of this volume is critical of the tests, our broader purpose is to document the controversy over use of the tests in behavioral research. We have used the term "controversy" to characterize the literature on significance tests, but the sense in which a "controversy" exists must be understood in a special way. "Controversy" implies dialogue over points of disagreement, and in view of the fact that such dialogue has not always occurred, the term may be an overstatement. Both in sociology and psychology critics of the tests have reacted to what they view as erroneous research practice based on misguided statistical training. Essays that respond specifically to this criticism by explicitly defending the tests have appeared only in the sociological literature, however, so that the controversy has only in part taken the form of an extended debate or dialogue. In the behavioral sciences in general the overwhelming practice by both researchers and those responsible for statistical training has been to ignore the issues raised by the critics and to continue doing things as before. Thus the preponderance of the negative side of the "debate" in this volume does not represent so much bias as redress, since the amount of behavioral science writing that implicitly supports the tests is far greater than that which is critical.
Although most of the literature critical of significance tests has been written by sociologists and psychologists, the book is addressed to all behavioral scientists and to both professionals and students. We have thus selected readings for a group that as a whole is not extensively trained in mathematics, logic, and statistics. We hope that readers who are learned in these subjects will also find something of value in the readings, though the selections are mainly nontechnical. The readings do not provide computational instruction, nor do they compare one test with another. Dozens of standard statistical texts have done, if not overdone, this. The readings necessarily require a grasp of the broad and basic technical matters dealing with the nature and meaning of the sampling assumptions, probability statements, and parameter estimates connected with use of the tests. We assume, therefore, that the reader has some knowledge of statistics, at least that obtained through an introductory course.
The general failure to incorporate a critical perspective on significance tests into research training and practice has brought a considerable reiteration of the critical points in the literature, and this reiteration is reflected in substantial repetition of the same criticism in the essays of this volume. While we realize this repetition has its negative aspects, our view is that it will be more beneficial than harmful in a book of this sort. And though we readily confess to a desire to proselytize a skeptical perspective on the tests, our rationale is not simply that the sheer force of repetition will help drive home some of the points made, for the critical points are made at varying levels of statistical and philosophy of science difficulty in the readings. Thus, we hope the repetition will both provide something of value for a considerable range of reader sophistication and encourage step-wise growth in many readers' level of understanding. Also, the criticism often appears to have been made without the knowledge that similar criticism was made elsewhere; the independent occurrence of similar critical points in the literature adds to their persuasiveness and validity. It is impossible to overestimate the extent to which use and misuse of the tests is ingrained in the behavioral sciences. Researchers will profit from knowing that writers with diverse substantive and methodological interests have offered carefully reasoned and often basically similar criticism of the tests. Consequently, we have deliberately avoided deleting repetitious material from the selections and have attempted to be relatively complete in documenting the controversy by including both the major and minor essays in the main behavioral sciences where they have occurredsociology and psychology.
This volume is not, however, intended as a general history of either the concept of statistical significance or the broader controversy surrounding this notion. Such a history would require inclusion of an extensive literature by such writers as Fisher, Neyman, E. S. Pearson, Jeffreys, and Wald, at the very least. While such a collection would be of considerable value, we believe that, since most of this literature is quite technical, most behavioral scientists would neither read it nor gain as much from it as from the selections we have included. The essays in the first part of this reader provide an ample, if critical, general historical context on the tests for the present purpose.
We wish to be clear that the volume is not intended as a blanket condemnation of statistics. The focus of our critical concern is inferential and not descriptive statistics, both in terms of how statistical inference is done in behavioral research and, more important, whether, given basic (in contrast with applied) scientific goals, it is worth doing. In essence, our view is that the significance test as typically employed in behavioral science is bad statistical inference, and that even good statistical inference in basic research is typically only a convenient way of sidestepping rather than solving the problem of scientific inference. Although we do not pretend that the selections of this volume go far toward solving the problems of scientific inference, we are convinced that the diversion of energy away from the rituals of significance testing in basic scientific research will be a worthy first step toward this goal and will, in fact, be one difference in behavioral science that is significant. Thus, we hope that this volume will be a modest contribution not only to more sensible statistical practice but to a more adequate philosophy of social science.
Next page