Weather, stock markets, and heartbeats. They all form time series. If youre interested in diverse data and forecasting the future, youre interested in time series analysis.
Time series data spans a wide range of disciplines and use cases. It can be anything from customer purchase histories to conductance measurements of a nano-electronic system to digital recordings of human language. One point we discuss throughout the book is that time series analysis applies to a surprisingly diverse set of data. Any data that has an ordered axis can be analyzed with time series methods, even if that ordered axis is not time per se. Traditional time series data, such as stock data and weather patterns, can be analyzed with time series methods, but so can quirky data sets such as spectrographs of wine, where the time axis is actually an axis of frequency. Time series are everywhere.
Who Should Read This Book
There are two kinds of intended readers for this book. The first and larger category of reader is that of a data scientist who has rarely worked with time series data. This person could be an industry veteran or a junior analyst. The more experienced data analyst can skim the introductory conceptual areas of each chapter but will still benefit from this books discussions about best practices as well as pitfalls of working with time series data. A newer data analyst might consider working through the book in its entirety, although I have tried to keep each topic as self-contained as possible.
The second category of reader is someone supervising analytics at an organization with an extensive in-house data collection. If you are a member of this group, you will still need some technical background, but its not necessary that you be currently coding in your professional life. For such a reader, this book is useful to point out opportunities for your organization to use time series analysis even if it is not currently practiced in-house. This book will point you to new kinds of questions and analyses your organization can address with your existing data resources.
Expected Background
With respect to coding, you should have some familiarity with R and Python, especially with certain fundamental packages (in Python: NumPy, Pandas, and scikit-learn
; and in R: data.table
). The code samples should be readable even without all the background, but in that case you may need to take a short detour to familiarize yourself with these packages. This is most likely the case with respect to Rs data.table, an underused but highly performant data frame package that has fantastic time functionality .
In all cases, I have provided brief overviews of the related packages, some example code, and descriptions of what the code does. I also point the reader toward more complete overviews of the most used packages.
With respect to statistics and machine learning, you should have some familiarity with:
Introductory statistics
Ideas such as variance, correlation, and probability distributions
Machine learning
Clustering and decision trees
Neural networks
What they are and how they are trained
For these cases, I provide a brief overview of such concepts within the text, but the uninitiated should read more deeply about them before continuing with some chapters. For most topics, I provide links to recommended free online resources for brief tutorials on the fundamentals of a given topic or technique.
Why I Wrote This Book
I wrote this book for three reasons.
First, time series is an important aspect of data analysis but one that is not found in the standard data science toolkit. This is unfortunate both because time series data is increasingly available and also because it answers questions that cross-sectional data cannot. An analyst who does not know fundamental time series analysis is not making the most of their data. I hoped that this book could fill an existing and important void.
Second, when I started writing this book, I knew of no centralized overview of the most important aspects of time series analysis from a modern data science perspective. There are many excellent resources available for traditional time series analysis, most notably in the form of classic textbooks on statistical time series analysis. There are also many excellent individual blog posts on both traditional statistical methods and on machine learning or neural network approaches to time series. However, I could not identify a single centralized resource to outline all these topics and relate them to one another. The goal of this book is to provide that resource: a broad, modern, and practical overview of time series analysis covering the full pipeline for time series data and modeling. Again, I hoped that this book could fill an existing and important void.