SAGE Publications Ltd
1 Olivers Yard
55 City Road
London EC1Y 1SP
SAGE Publications Inc.
2455 Teller Road
Thousand Oaks, California 91320
SAGE Publications India Pvt Ltd
B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road
New Delhi 110 044
SAGE Publications Asia-Pacific Pte Ltd
3 Church Street
#10-04 Samsung Hub
Singapore 049483
Martin Elff, 2021
First published 2021
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.
Library of Congress Control Number: 2020938317
British Library Cataloguing in Publication data
A catalogue record for this book is available from the British Library
ISBN 978-1-5264-5996-1
ISBN 978-1-5264-5997-8 (pbk)
Editor: Natalie Aguilera
Production editor: Ian Antcliff
Copyeditor: QuADS Prepress Pvt Ltd
Proofreader: Neville Hankins
Marketing manager: George Kimble
Cover design: Lisa Harper-Wells
Typeset by: C&M Digitals (P) Ltd, Chennai, India
Printed in the UK
At SAGE we take sustainability seriously. Most of our products are printed in the UK using responsibly sourced papers and boards. When we print overseas we ensure sustainable papers are used as measured by the PREPS grading system. We undertake an annual audit to monitor our sustainability.
About the Author
Martin Elffis Professor of Political Sociology at Zeppelin University in Friedrichshafen, Germany. He is a political scientist with research interests in the fields of political behaviour, party competition, and political methodology. He has published research articles in journals such as the
British Journal of Political Science,
Electoral Studies, the
European Journal of Political Research,
Perspectives on Politics, and
Political Analysis. He is the author of several R packages, of which the
memisc,
mclogit, and
munfold packages have been published on the
Comprehensive R Archive Network (http://cran.r-project.org). Since 2006 he has been teaching courses on R at the
Essex Summer School of Social Science Data Analysis at the University of Essex as well as at various other institutions.
Preface
This book is the fruit of not only a resolved misunderstanding but also a long unacknowledged desire. In 2017, I had the privilege to chair a panel on big data at the annual conference of the European Political Science Association in Milan, Italy, even though I do not consider myself a big data expert. SAGE Publications at that time were looking for potential authors of textbooks about big data and related matters, but they were also open to suggestions from academics about different promising topics. Therefore, we agreed that I would submit a book proposal to find out whether there is a demand for a book about data management with R.
What motivated me to write a book about data management is my own experience that data management usually takes up as much time of a research project as does the actual statistical analysis of data but there is relatively little literature about it. The reason for this might be that it is possible to build an academic career by writing about research findings and data analysis but the chances of earning academic laurels with work on data management appear to be rather low. Of course, in recent years, data science is on the verge of establishing itself as an academic field if not an academic discipline. But what literature exists on data science seems to be better suited to the needs of business analytics than to the needs of social science research.
What also motivated me to write this book is that I have been enthusiastic about free and open source software for decades and that R is the best available free and open source software for data analysis. While I have worked with commercial software for social science data analysis and management in the earlier stages of my academic career and was able to gain experience in data management and data analysis with SPSS (IBM, 2017), SAS (SAS Institute, 2013), and Stata (StataCorp, 2019), I nevertheless developed the ambition to do all of this in R.
Many of my colleagues are swayed by the high-quality graphics that can be produced with R and by the fact that the most advanced techniques of data analysis are first implemented, and usually very quickly so, in R. Nevertheless, many of them also stick to commercial software like SPSS or Stata for data management. Part of the reason for this is that these colleagues work with data from social science surveys and SPSS and Stata provide functionality that makes it relatively easy to manage such data. Besides, R offers little out-of-the-box support for the management of the kind of data that come from social science surveys, or rather, it offers little support apart from hitherto not so well-known packages such as memisc (Elff, 2019). It is therefore an aim of this book to enable social scientists to conduct not only their data analysis but also their data management with R.
Due to the fact that R allows the user to produce high-quality graphics and to use the most advanced techniques of data analysis, and also (and perhaps even more so) that it can be used without the obligation to pay any licence fees, R finds expanding use in social science education at the graduate and postgraduate levels and is also gaining ground at the undergraduate level. From my own experience as a teacher and instructor, I know that many students who are using R struggle at least as much with preparing their data for data analysis as they do with conducting data analysis and understanding the complex concepts involved. I hope that this book is also helpful for these students.
To make sure that the book is useful for as wide an audience as possible, writing it moved me to the margins of and beyond my own intellectual comfort zone of managing and analysing survey data. Writing this book was at times challenging but always stimulating. I hope that readers find the material presented in the book fun and stimulating as well. I dedicate this book to my parents, my friends, and my colleagues and also to the brilliant minds that created and contributed to the wonderful open source software project that is R.