Table of Contents
List of Tables
- Chapter 04
- Chapter 05
- Chapter 06
- Chapter 07
- Chapter 08
- Chapter 09
- Chapter 10
- Chapter 11
- Chapter 12
- Chapter 15
- Chapter 16
List of Illustrations
- Chapter 01
- Chapter 03
- Chapter 04
- Chapter 05
- Chapter 06
- Chapter 07
- Chapter 08
- Chapter 09
- Chapter 10
- Chapter 13
- Chapter 14
- Chapter 15
- Chapter 16
Guide
Pages
Information Quality
The Potential of Data and Analytics to Generate Knowledge
Ron S. Kenett
KPA, Israel and University of Turin, Italy
Galit Shmueli
National Tsing Hua University, Taiwan
This edition first published 2017
2017 John Wiley & Sons, Ltd
Registered office
John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress CataloginginPublication Data
Names: Kenett, Ron. | Shmueli, Galit, 1971
Title: Information quality : the potential of data and analytics to generate knowledge / Ron S. Kenett, Dr. Galit Shmueli.
Description: Chichester, West Sussex : John Wiley & Sons, Inc., 2017. | Includes bibliographical references and index.
Identifiers: LCCN 2016022699| ISBN 9781118874448 (cloth) | ISBN 9781118890653 (epub)
Subjects: LCSH: Data mining. | Mathematical statistics.
Classification: LCC QA276 .K4427 2017 | DDC 006.3/12dc23
LC record available at https://lccn.loc.gov/2016022699
A catalogue record for this book is available from the British Library.
To Sima; our children Dolav, Ariel, Dror, and Yoed; and their families and especially their children, Yonatan, Alma, Tomer, Yadin, Aviv, Gili, Matan, and Eden, they are my source of pride and motivation.
And to the memory of my dear friend, Roberto Corradetti, who dedicated his career to applied statistics.
RSK
To my family, mentors, colleagues, and students whove sparked and nurtured the creation of new knowledge and innovative thinking
GS
Foreword
I am often invited to assess research proposals. Included amongst the questions I have to ask myself in such assessments are: Are the goals stated sufficiently clearly? Does the study have a good chance of achieving the stated goals? Will the researchers be able to obtain sufficient quality data for the project? Are the analysis methods adequate to answer the questions? And so on. These questions are fundamental, not merely for research proposals, but for any empirical study for any study aimed at extracting useful information from evidence or data. And yet they are rarely overtly stated. They tend to lurk in the background, with the capability of springing into the foreground to bite those who failed to think them through.
These questions are precisely the sorts of questions addressed by the InfoQ Information Quality framework. Answering such questions allows funding bodies, corporations, national statistical institutes, and other organisations to rank proposals, balance costs against success probability, and also to identify the weaknesses and hence improve proposals and their chance of yielding useful and valuable information. In a context of increasing constraints on financial resources, it is critical that money is well spent, so that maximising the chance that studies will obtain useful information is becoming more and more important. The InfoQ framework provides a structure for maximising these chances.
A glance at the statistics shelves of any technical library will reveal that most books focus narrowly on the details of data analytic methods. The same is true of almost all statistics teaching. This is all very well it is certainly vital that such material be covered. After all, without an understanding of the basic tools, no analysis, no knowledge extraction would be possible. But such a narrow focus typically fails to place such work in the broader context, without which its chances of success are damaged. This volume will help to rectify that oversight. It will provide readers with insight into and understanding of other key parts of empirical analysis, parts which are vital if studies are to yield valid, accurate, and useful conclusions.
But the book goes beyond merely providing a framework. It also delves into the details of these overlooked aspects of data analysis. It discusses the fact that the same data may be high quality for one purpose and low for another, and that the adequacy of an analysis depends on the data and the goal, as well as depending on other less obvious aspects, such as the accessibility, completeness, and confidentiality of the data. And it illustrates the ideas with a series of illuminating applications.
With computers increasingly taking on the mechanical burden of data analytics the opportunities are becoming greater for us to shift our attention to the higher order aspects of analysis: to precise formulation of the questions, to consideration of data quality to answer those questions, to choice of the best method for the aims, taking account of the entire context of the analysis. In doing so we improve the quality of the conclusions we reach. And this, in turn, leads to improved decisions for researchers, policy makers, managers, and others. This book will provide an important tool in this process.
David J. Hand
Imperial College London
About the authors
Ron S. Kenett
Next page