Sebastian Maurice
Transactional Machine Learning with Data Streams and AutoML
Build Frictionless and Elastic Machine Learning Solutions with Apache Kafka in the Cloud Using Python
1st ed.
Logo of the publisher
Sebastian Maurice
Toronto, ON, Canada
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com/978-1-4842-7022-6. For more detailed information, please visit http://www.apress.com/source-code.
ISBN 978-1-4842-7022-6 e-ISBN 978-1-4842-7023-3
https://doi.org/10.1007/978-1-4842-7023-3
Sebastian Maurice 2021
Apress Standard
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Apress imprint is published by the registered company APress Media, LLC part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
For my daughter Matea, my wife Ellen, and my mom and dad, Frank and Mary.
Introduction
Fast data requires fast machine learning for fast decision-making. Understand how to apply auto machine learning to data streams with Apache Kafka in the cloud using Python, and create transactional machine learning (TML) solutions that are frictionless (require minimal to no human intervention) and elastic (machine learning solutions that can scale up or down by controlling the number of data streams, algorithms, and users of the insights). This book will strengthen your knowledge of the inner workings of TML solutions using data streams with auto machine learning integrated with Apache Kafka that are scalable. You will be at the forefront of an exciting area of machine learning that is focused on speed of data and algorithm creation, scale, and automation that will drive business value in almost every industry.
By the end of the book, you will have a solid understanding of the technical and business aspects of TML. You will know how to build TML solutions with all the necessary details, freely available software for download, all at your fingertips. You will be at the technical and business forefronts, in the knowledge economy where data creation speeds are increasing, requiring fast machine learning solutions that are frictionless and elastic, for fast decision-making that can create enormous business value on a large scale!
Acknowledgments
I would like to thank my wife Ellen and daughter Matea for their support in the writing of this book. Their ongoing support was invaluable.
I would like to thank Michael Scappaticci for his editing and review of the chapters and discussions on some of the concepts presented in this book. I would also like to thank Phoenix Unnayan Majumder for his inputs, especially on Apache Kafka. I would also like to thank Tim Raiswell; his insights and thoughts were important in shaping some of the ideas in this book.
Lastly, I want to thank my parents, Frank and Mary, for their continued support and encouragements in pursuing all of my dreams and goals in life.
Table of Contents
About the Author
Sebastian Maurice
is the founder and CTO of OTICS Advanced Analytics Inc. and has over 25 years of experience in AI and machine learning. Previously, Sebastian served as Associate Director within Gartner Consulting, focusing on artificial intelligence and machine learning. He was instrumental in developing and growing Gartners AI consulting business. He has led global teams to solve critical business problems with machine learning in oil and gas, retail, utilities, manufacturing, finance, and insurance. Dr. Maurice also brings deep experience in oil and gas (upstream) and was one of the first in Canada to apply machine learning to oil production optimization, which resulted in a Canadian patent: #2864265.
Sebastian is also a published author with seven publications in international peer-reviewed journals and books. One of his publications (International Journal of Engineering Education, 2004) was cited as landmark work in the area of online testing technology. He also developed the worlds first Apache Kafka connector for transactional machine learning: MAADS-VIPER.
Dr. Maurice received his PhD in electrical and computer engineering from the University of Calgary and has a masters in electrical engineering and a masters in agricultural economics, with a bachelors in pure mathematics and a bachelors (hon) in economics.
Dr. Maurice also teaches a course on data science and actively helps to develop AI course content at the University of Toronto. He is also active in the AI community and an avid blogger and speaker. He also sits on the AI advisory board at McMaster University.
About the Technical Reviewer
Tim Raiswell
is a principal in advanced analytics with Loftus Labs, a consulting firm that specializes in agribusiness data science. His current areas of research include machine learning for business decision support and the importance of culture and attitude in the adoption of analytic decision-making. Tim lives in the state of Maryland with his family.
The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
S. Maurice Transactional Machine Learning with Data Streams and AutoML https://doi.org/10.1007/978-1-4842-7023-3_1
1. Introduction: Big Data, Auto Machine Learning, and Data Streams
Data streams are a class of data that is continuously updated and captured and grows in volume and is largely unbounded [Aggarwal, 2007; Wrench et al., 2016]. Consider how our everyday lives contribute to data streams. Every time we purchase something with a credit card, the purchasing event information about your name, purchase amount, product purchased, time and date purchased, location where it was purchased, quantity, product code, and so on are all captured in real time and stored in a data storage platform capable of storing large amounts of data. Browsing the Web also results in enormous amounts of data flowing through IP networks that are being captured by your Internet service providers (ISPs) . Even the cars we drive are becoming more connected to the Internet. The car manufacturers are capturing and storing all of the telemetry and GPS data.