DATA MINING
FOR BUSINESS ANALYTICS
Concepts, Techniques, and Applications in Python
GALIT SHMUELI
PETER C. BRUCE
PETER GEDECK
NITIN R. PATEL
This edition first published 2020
2020 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Galit Shmueli, Peter C. Bruce, Peter Gedeck, and Nitin R. Patel to be identified as the authors of this work has been asserted in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
The publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties; including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of on-going research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.
Library of Congress Cataloging-in-Publication Data applied for
Hardback: 9781119549840
Cover Design: Wiley
Cover Image: Achim Mittler, Frankfurt am Main/Gettyimages
The beginning of wisdom is this:
Get wisdom, and whatever else you get, get insight.
Proverbs 4:7
In memory of Professor Ayala Cohen (19402019)
who combined wisdom, insight, enthusiasm, and care
Peter Gedeck dedicates this book to his son, Victor
Foreword by Gareth James
The field of statistics has existed in one form or another for 200 years, and by the second half of the 20th century had evolved into a well-respected and essential academic discipline. However, its prominence expanded rapidly in the 1990s with the explosion of new, and enormous, data sources. For the first part of this century, much of this attention was focused on biological applications, in particular, genetics data generated as a result of the sequencing of the human genome. However, the last decade has seen a dramatic increase in the availability of data in the business disciplines, and a corresponding interest in business-related statistical applications.
The impact has been profound. Ten years ago, when I was able to attract a full class of MBA students to my new statistical learning elective, my colleagues were astonished because our department struggled to fill most electives. Today, we offer a Masters in Business Analytics, which is the largest specialized masters program in the school and has application volume rivaling those of our MBA programs. Our departments faculty size and course offerings have increased dramatically, yet the MBA students are still complaining that the classes are all full. Googles chief economist, Hal Varian, was indeed correct in 2009 when he stated that the sexy job in the next 10 years will be statisticians.
This demand is driven by a simple, but undeniable, fact. Business analytics solutions have produced significant and measurable improvements in business performance, on multiple dimensions and in numerous settings, and as a result, there is a tremendous demand for individuals with the requisite skill set. However, training students in these skills is challenging given that, in addition to the obvious required knowledge of statistical methods, they need to understand business-related issues, possess strong communication skills, and be comfortable dealing with multiple computational packages. Most statistics texts concentrate on abstract training in classical methods, without much emphasis on practical, let alone business, applications.
This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject. However, just as important as the list of topics, is the way that they are all presented in an applied fashion using business applications. Indeed the last chapter is entirely dedicated to 10 separate cases where business analytics approaches can be applied.
In this latest edition, the authors have added support for Python, a programming language that is rapidly gaining popularity among data scientists. The book provides detailed descriptions and code involving applications of Python in numerous business settings, ensuring that the reader will actually be able to apply their knowledge to real-life problems. Im confident that this book will be an indispensable tool for any business analytics course using Python.
We recently introduced a business analytics course into our required MBA core curriculum and I intend to make heavy use of this book in developing the syllabus. Im confident that it will be an indispensable tool for any such course.
GARETH JAMES
Marshall School of Business, University of Southern California, 2019
Foreword by Ravi Bapna
Data is the new goldand mining this gold to create business value in todays context of a highly networked and digital society requires a skillset that we havent traditionally delivered in business or statistics or engineering programs on their own. For those businesses and organizations that feel overwhelmed by todays Big Data, the phrase
Next page