FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS
2015 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information, please email
This book was set in the programming language by the author. Printed and bound in the United States of America.
Library of Congress Cataloging-in-Publication Data
Kelleher, John D., 1974
Fundamentals of machine learning for predictive data analytics : algorithms, worked examples, and case studies / John D. Kelleher, Brian Mac Namee, and Aoife DArcy.
pages cm
Includes bibliographical references and index.
ISBN 978-0-262-02944-5 (hardcover : alk. paper) 1. Machine learning. 2. Data mining. 3. Prediction theory. I. Mac Namee, Brian, 1978 II. DArcy, Aoife, 1978 III. Title.
Q325.5.K455 2015
006.31dc23
2014046123
10 9 8 7 6 5 4 3 2 1
To my wife and family,thank you for your love, support, and patience.John
To my family.
Brian
To Grandad DArcy, for the inspiration.Aoife
In writing this book our target was to deliver an accessible, introductory text on the fundamentals of machine learning, and the ways that machine learning is used in practice to solve predictive data analytics problems in business, science, and other organizational contexts. As such, the book goes beyond the standard topics covered in machine learning books and also covers the lifecycle of a predictive analytics project, data preparation, feature design, and model deployment.
The book is intended for use on machine learning, data mining, data analytics, or artificial intelligence modules on undergraduate and post-graduate computer science, natural and social science, engineering, and business courses. The fact that the book provides case studies illustrating the application of machine learning within the industry context of data analytics also makes it a suitable text for practitioners looking for an introduction to the field and as a text book for industry training courses in these areas.
The design of the book is informed by our many years of experience in teaching machine learning, and the approach and material in the book has been developed and road-tested in the classroom. In writing this book we have adopted the following guiding principles to make the material accessible:
- Explain the most important and popular algorithms clearly, rather than overview the full breadth of machine learning. As teachers we believe that giving a student deep knowledge of the core concepts underpinning a field provides them with a solid basis from which they can explore the field themselves. This sharper focus allows us to spend more time introducing, explaining, illustrating and contextualizing the algorithms that are fundamental to the field, and their uses.
- Informally explain what an algorithm is trying to do before presenting the technical formal description of how it does it. Providing this informal introduction to each topic gives students a solid basis from which to attack the more technical material. Our experience with teaching this material to mixed audiences of undergraduates, post-graduates and professionals has shown that these informal introductions enable students to easily access the topic.
- Provide complete worked examples. In this book we have presented complete workings for all examples, because this enables the reader to check their understanding in detail.
When teaching a technical topic, it is important to show the application of the concepts discussed to real-life problems. For this reason, we present machine learning within the context of predictive data analytics, an important and growing industry application of machine learning. The link between machine learning and data analytics runs through every chapter in the book. In we explain how to design, construct, and quality check a dataset before using it to a build prediction model.
, learning by searching for solutions that minimize error. All of these chapters follow the same two part structure
- Part 1 presents an informal introduction to the material presented in the chapter, followed by a detailed explanation of the fundamental technical concepts required to understand the material, and then a standard machine learning algorithm used in that learning approach is presented, along with a detailed worked example.
- Part 2 of each chapter explains different ways that the standard algorithm can be extended and well-known variations on the algorithm.
The motivation for structuring these technical chapters in two parts is that it provides a natural break in the chapter material. As a result, a topic can be included in a course by just covering Part 1 of a chapter (Big Idea, fundamentals, standard algorithm and worked example); and thentime permittingthe coverage of the topic can be extended to some or all of the material in Part 2. along with references to the datasets and/or papers that the examples are based on.
The link between the broader business context and machine learning is most clearly seen in the case studies presented in discusses a range of fundamental topics in machine learning and also highlights that the selection of an appropriate machine learning approach for a given task involves factors beyond model accuracywe must also match the characteristics of the model to the needs of the business.
Through our years of teaching this material we have developed an understanding of what is a reasonable amount of material to cover in a one-semester introductory module and on two-semester more advanced modules. To facilitate the use of the book in these different contexts, the book has been designed to be modularwith very few dependencies between chapters. As a result, a lecturer using this book can plan their course by simply selecting the sections of the book they wish to cover and not worry about the dependencies between the sections. When presented in class, the material in normally take four to six lecture hours to cover.
In is another one-semester machine learning course. Here, however, the focus is on covering a range of machine learning approaches and, again, evaluation is covered in detail. For a longer two-semester machine learning course (M.L. long) we suggest covering data preparation (Section 3.6), all the machine learning chapters, and the evaluation chapter.
There are contexts, however, where the focus of a course is not primarily on machine learning. We also present to course paths that focus on the context of predictive data analytics. The course P.D.A short defines a one-semester course. This course gives students an introduction to predictive data analytics, a solid understanding of how machine learning solutions should be designed to meet a business need, insight into how prediction models work and should be evaluated, and includes one of the case studies. The P.D.A short is also an ideal course plan for a short (1 week) professional training course. If there is more time available then P.D.A long expands on the P.D.A. short course so that students gain a deeper and broader understanding of machine learning, and also includes the second case study.
Next page