Table of Contents
Guide
Page List
DATA MINING AND
PREDICTIVE ANALYTICS FOR BUSINESS DECISIONS
LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY
By purchasing or using this book and its companion files (the Work), you agree that this license grants permission to use the contents contained herein, but does not give you the right of ownership to any of the textual content in the book or ownership to any of the information, files, or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work.
MERCURY LEARNING AND INFORMATION (MLI or the Publisher) and anyone involved in the creation, writing, production, accompanying algorithms, code, or computer programs (the software), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to insure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold as is without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship).
The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work.
The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and only at the discretion of the Publisher. The use of implied warranty and certain exclusions vary from state to state, and might not apply to the purchaser of this product.
Companion files also available for downloading from the publisher by writing to .
DATA MINING AND PREDICTIVE ANALYTICS
FOR BUSINESS DECISIONS
A Case Study Approach
Andres Fortino, PhD
NYU School of Professional Studies
MERCURY LEARNING AND INFORMATION
Dulles, Virginia
Boston, Massachusetts
New Delhi
Copyright 2023 by MERCURY LEARNING AND INFORMATION LLC. All rights reserved.
This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.
Publisher: David Pallai
MERCURY LEARNING AND INFORMATION
22841 Quicksilver Drive
Dulles, VA 20166
www.merclearning.com
1-800-232-0223
A. Fortino. Data Mining and Predictive Analytics for Business Decisions.
ISBN: 978-1-68392675-7
The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.
Library of Congress Control Number: 2022950710
232425321 Printed on acid-free paper in the United States of America.
Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).
All of our titles are available for sale in digital format at numerous digital vendors. Companion files for this title can also be downloaded by writing to . The sole obligation of MERCURY LEARNING AND INFORMATION to the purchaser is to replace the book, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.
To my wife, Kathleen
CONTENTS
PREFACE
Data mining is a recent development in the area of data analysis within the last 20 years. With many recent advances in data science, we now have many more tools and techniques available for data analysts to extract information from data sets. This book aims to assist data analysts to move up from simple tools such as Excel for descriptive analytics to answer more sophisticated questions using machine learning. Data mining is a very sophisticated and organized activity with a well-defined process encoded in the CRISP-DM standard. In this book we develop an understanding of the tools and techniques to assist the individual data analyst, but not necessarily a data science team. This book intends to assist individual data analysts in helping them improve their understanding and skills to answer more sophisticated questions.
Most of the exercises use R and Python, todays most common analysis tools. But rather than focus on coding algorithms with these tools, as is most often the case, we employ interactive interfaces to these tools to perform the analysis. That way, we can focus on the technique and its interpretation rather than developing coding skills. We rely on the Jamovi and the JASP interfaces to the R program and the Orange3 data mining interface to Python. Where appropriate, we introduce additional easy-to-acquire and use tools, such as Voyant, for text analytics, that are available as open source. The techniques covered in this book range from basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics.
We follow the CRISP-DM process throughout, but only as a simple guide to the various steps without necessarily implementing all its procedures. We intend to focus on data analytics, not necessarily the more sophisticated data science approaches. This book is for you if you wish to improve your analytical skills and get practical knowledge of some machine learning approaches. Suppose youre looking for a more profound treatment of many of the techniques presented here, such as their mathematical foundations or more detailed considerations in the use of the algorithms. In that case, you are best served by consulting more advanced texts. This book is not meant to explain the origins or characteristics of each method thoroughly. Instead, at the heart of the book is a series of exercises and real-life case studies, putting each technique or tool to work in different business situations. We leave it for other authors and other texts to present the theoretical and explanatory understanding of the tools. A significant contribution of this book is a curated database of business data files that should provide plenty of practice to acquire skills in each of the techniques presented. The exercises and cases in each chapter are presented with step-by-step explanations to help you acquire skills in their use.