Machine Learning for Time-Series with Python
Forecast, predict, and detect anomalies with state-of-the-art machine learning methods
Ben Auffarth
BIRMINGHAMMUMBAI
Machine Learning for Time-Series with Python
Copyright 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Producer: Dr. Shailesh Jain
Acquisition Editor Peer Reviews: Saby Dsilva
Project Editor: Namrata Katare
Content Development Editor: Alex Patterson
Copy Editor: Safis Editor
Technical Editor: Aditya Sawant
Proofreader: Safis Editor
Indexer: Sejal Dsilva
Presentation Designer: Pranit Padwal
First published: October 2021
Production reference: 1281021
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80181-962-6
www.packt.com
Contributors
About the author
Ben Auffarth is the author of Artificial Intelligence with Python Cookbook, and he co-founded and is the former president of Data Science Speakers, London. With a Ph.D. in computer science, Ben Auffarth has analyzed experiments with terabytes of data, run brain models on up to 64k cores, built systems processing hundreds of thousands of transactions per day, and trained neural networks on millions of text documents. He often encounters time-series problems in his work.
My partner was working hard over the weekends so I could concentrate, and my son of two and a half years would often tell me to get to work ("work, papa"). I'm reading lots of stories to him to make up for this time. I'd like to thank the technical reviewer for his fantastic suggestions and spotting many errors (any remaining ones are on me).
About the reviewers
Kevin Sheppard is an academic economist who specializes in the application of statistical methodology to measuring economic phenomena. His research focuses on developing statistical methodology for measuring, modeling, and forecasting measures of risk. Kevin's research is widely used in portfolio management and risk measurement. He is the maintainer of the arch and linearmodels Python packages. He is also a core contributor to statsmodels and a committer to pandas and PyData.
In 2019, his contributions to NumPy were recognized by an award from NumFocus. He has worked at the University of Oxford for the past 15 years. During this period, he has also worked for the Office of Financial Research in the U.S. Department of Treasury and has worked as a consultant to other governments and in the finance industry. Prior to joining Oxford, Kevin completed his PhD at the University of California-San Diego.
Dr Andrey Kostenko recently assumed the role of lead data scientist at the Hydroinformatics Institute (H2i.sg), a specialized consultancy and solution services provider for all aspects of water management. Prior to joining H2i, Andrey had worked as a senior data scientist at IAG InsurTech Innovation Hub for over 3 years. Before moving to Singapore in 2018, he worked as a data scientist at TrafficGuard.ai, an Australian AdTech start-up developing novel data-driven algorithms for mobile ad fraud detection. In 2013, Andrey received his doctorate degree in mathematics and statistics from Monash University, Australia, after earning an MBA degree from the UK and his first university degree from Russia.
Andrey is an enthusiastic, self-motivated, and result-oriented data science and machine learning professional, with extensive experience across a variety of disciplines and industries, including hands-on coding in R and Python to build, train, and serve time-series models for forecasting and other applications. He believes that lifelong learning and open source software are both critical for innovation in advanced analytics and artificial intelligence. Andrey is very passionate about data science in general and sequential data in particular, so one of his current focuses is on applications of deep learning to spatiotemporal data in the context of weather-related decision making.
In his spare time, Andrey is often found engaged in competitive data science projects, learning new tools across the R and Python ecosystems, exploring the latest trends in web development, solving chess puzzles, or reading about the history of science and mathematics.
Preface
Time-series are ubiquitous in industry and in research. Examples of time-series can be found in healthcare, energy, finance, user behavior, and website metrics to name just a few. Due to their prevalence, time-series modeling and forecasting is crucial and it's of great economic importance to be able to model them accurately.
While traditional and well-established approaches have been dominating econometrics research and until recently industry, machine learning for time-series is a relatively new research field that's only recently come out of its infancy.
In the last few years, a lot of progress has been made in machine learning on time-series; however, little of this has been made available in book form for a technical audience. Many books focus on traditional techniques, but hardly deal with recent machine learning techniques. This book aims to fill this gap and covers a lot of the latest progress, as evident in results from competition such as M4, or the current state-of-the-art in time-series classification.
If you read this book, you'll learn about established as well as cutting edge techniques and tools in Python for machine learning with time-series. Each chapter covers a different topic, such as anomaly detection, probabilistic models, drift detection and adaptive online learning, deep learning models, and reinforcement learning. Each of these topics comes with a review of the latest research and an introduction to popular libraries with examples.
Who this book is for
If you want to build models that are reactive to the latest trends, seasonality, and business cycles, this is the book for you. This book is for data scientists, analysts, or programmers who want to learn more about time-series, and want to catch up on different techniques in machine learning.
What this book covers
Chapter 1, Introduction to Time-Series with Python, is a general introduction to the topic. You'll learn about time-series and why they are important, and many conventions, and you'll see an overview of applications and techniques that will be explained in more detail in dedicated chapters.