Practical Data Analysis Using Jupyter Notebook
Learn how to speak the language of data by extracting useful and actionable insights using Python
Marc Wintjen
BIRMINGHAM - MUMBAI
Practical Data Analysis using Jupyter Notebook
Copyright 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey Varangaonkar
Acquisition Editor: Devika Battike
Content Development Editor: Nazia Shaikh
Senior Editor: Ayaan Hoda
Technical Editor: Manikandan Kurup
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Priyanka Dhadke
Production Designer: Shankar Kalbhor
First published: June 2020
Production reference: 1180620
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-83882-603-1
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.
At www.packt.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Foreword
It has been stated in many ways that society is producing more data than we have ever before and that we have just scratched the surface on figuring out how to make use of it. Whether it is truly the oil, or gold, of the 21st century, only time will tell but it is clear that being able to read it, mold it, and tell a story with it are skills that will be in demand for the foreseeable future. These skills turn data, facts, and figures, into information, knowledge, and insights gained from that data.
With the media grasping on buzzwords in the last few years, it may lead one to believe that working with data is a pretty new field. However, techniques have been introduced and perfected over decades on how to glean insight from the digital world. In a sense, it is only in the past decade that we have been able to apply these techniques at scale. These old but true best practices are almost technology agnostic and will help with whatever language or solution you are implementing. As you read on, it is exciting that you will not only get to learn the exciting technology used to gain intuition from data but also the knowledge of someone who has applied these lessons in a business setting. It is only by asking the right questions of the data does one get valuable information.
Some of these concepts that get lost in practice become the most important part of any data role. One learns quickly how asking where the source of the data is could be a key to understanding if ones visualization makes sense. Missing or malformed data can lead to wrong conclusions and mistrust from the audience of any analysis. Applying the tools that you learn, will help in being confident that any artifact from your analysis will be sound and truthful.
Marc is a true advocate for data literacy and transparency. He knows the tricks to get the most out of the data and the questions to ask even before you get hands-on. I am confident that he will lead you through the journey well, and I wish you luck on your adventure.
Andrew Vlahutin
Data Scientist and Machine Learning Engineer
About the author
Marc Wintjen is a risk analytics architect at Bloomberg L.P. with over 20 years of professional experience. An evangelist for data literacy, he's known as the data mensch for helping others make data-driven decisions. His passion for all things data has evolved from SQL and data warehousing to big data analytics and data visualization.
I want to thank the many people who have supported me throughout my career. My inspiration comes from my colleagues, with whom I have shared long hours spent solving complex problems and gaining some wisdom along the way (hopefully). Also, this would not have been possible without the support of my family's love and sacrifice. To my wife, Debra, my daughter, Rebecca, and my son, Seth thank you! To my parents and extended family, thank you!
About the reviewers
Juan Jose Morales Urrutia has a Masters's degree in computer science from Georgia Tech. He is currently a senior software engineer at Bloomberg LP. Juan Jose is passionate about all things related to data. He enjoys designing data warehouses on the cloud, implementing scalable data pipelines, and building advanced analytics solutions using machine learning. Needless to say, he thinks this book will give you a fantastic introduction to the wonderful world of data.
Khaled Tannir has a Master of Research and a Master of Computer Science degree. He has more than 25 years of technical experience as a big data architect. He leads IT projects in multiple industries, such as banking, finance, and insurance. Creative and forward-thinking, and author of two books, he has focused for 10 years on big data, data mining, and machine learning. With significant experience in big data technologies, Khaled has implemented many Proofs of Concept that need different skills. Khaled is a big data trainer and mentor. He is the instructor of the Data at Scale
Next page