Sridhar Alla and Suman Kalyan Adari
Beginning Anomaly Detection Using Python-Based Deep Learning
With Keras and PyTorch
Sridhar Alla
New Jersey, NJ, USA
Suman Kalyan Adari
Tampa, FL, USA
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com/978-1-4842-5176-8 . For more detailed information, please visit www.apress.com/source-code .
ISBN 978-1-4842-5176-8 e-ISBN 978-1-4842-5177-5
https://doi.org/10.1007/978-1-4842-5177-5
Sridhar Alla, Suman Kalyan Adari 2019
Apress Standard
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
Introduction
Congratulations on your decision to explore deep learning and the exciting world of anomaly detection using deep learning.
Anomaly detection is finding patterns that do not adhere to what is considered as normal or expected behavior. Businesses could lose millions of dollars due to abnormal events. Consumers could also lose millions of dollars. In fact, there are many situations every day where peoples lives are at risk and where their property is at risk. If your bank account gets cleaned out, that is a problem. If your water line breaks, flooding your basement, thats a problem. If all flights get delayed in the airport, causing long delays, thats a problem. You might have been misdiagnosed or not diagnosed at all with a health issue, which is a very big problem directly impacting your well-being.
In this book, you will learn how anomaly detection can be used to solve business problems. You will explore how anomaly detection techniques can be used to address practical use cases and address real-life problems in the business landscape. Every business and use case is different, so while we cannot copy-paste code and build a successful model to detect anomalies in any dataset, this book will cover many use cases with hands-on coding exercises to give an idea of the possibilities and concepts behind the thought process.
We choose Python because it is truly the best language for data science with a plethora of packages and integrations with scikit-learn, deep learning libraries, etc.
We will start by introducing anomaly detection and then we will look at legacy methods of detecting anomalies used for decades. Then we will look at deep learning to get a taste of it.
Then we will explore autoencoders and variational autoencoders, which are paving the way for the next generation of generative models.
We will explore RBM (Boltzmann machines) as way to detect anomalies. Then well look at LSTMs (long short-term memory) models to see how temporal data can be processed.
We will cover TCN (Temporal Convolutional Networks) , which are the best in class for temporal data anomaly detection. Finally, we will look at several examples of anomaly detection in various business use cases.
In addition, we will also cover Keras and PyTorch, the two most popular deep learning frameworks in detail in the Appendix chapters.
You will combine all this extensive knowledge with hands-on coding using Jupyter notebook-based exercises to experience the knowledge first hand and see where you can use these algorithms and frameworks.
Best of luck and welcome to the world of deep learning!
Acknowledgments
Sridhar Alla
I would like to thank my wonderful, loving wife, Rosie Sarkaria, and my beautiful, loving daughter, Evelyn, for all their love and patience during the many months I spent writing this book. I would also like to thank my parents, Ravi and Lakshmi Alla, for their blessings and all the support and encouragement they continue to bestow upon me.
Suman Kalyan Adari
I would like to thank my parents, Krishna and Jyothi, and my loving dog, Pinky, for supporting me throughout the entire process of writing my first book. I would especially like to thank my sister, Niha, for helping me with graph creation, proof-reading, editing, and testing the code samples.
Table of Contents
About the Authors and About the Technical Reviewers
About the Authors
Sridhar Alla
is the co-founder and CTO of Bluewhale, which helps organizations big and small in building AI-driven big data solutions and analytics. He is a published author of books and an avid presenter at numerous Strata, Hadoop World, Spark Summit, and other conferences. He also has several patents filed with the US PTO on large-scale computing and distributed systems. He has extensive hands-on experience in several technologies including Spark, Flink, Hadoop, AWS, Azure, Tensorflow, Cassandra, and others. He spoke on anomaly detection using deep learning at Strata SFO in March 2019 and will also present at Strata London in October 2019.
Sridhar was born in Hyderabad, India and now lives in New Jersey with his wife, Rosie, and daughter, Evelyn. When he is not busy writing code he loves to spend time with his family; he also loves training, coaching, and organizing meetups.
He can be reached via email at sid@bluewhale.one or via LinkedIn at www.linkedin.com/in/sridhar-a-1619b42/ . Please visit www.bluewhale.one for more details on how he could help your organization.
Suman Kalyan Adari
is an undergraduate student pursuing a B.S. in Computer Science at the University of Florida. He has been conducting deep learning research in the field of cybersecurity since his freshman year, and has presented at the IEEE Dependable Systems and Networks workshop on Dependable and Secure Machine Learning held in Portland, Oregon, USA in June 2019.