Mastering Exploratory Analysis with pandas
Build an end-to-end data analysis workflow with Python
Harish Garg
BIRMINGHAM - MUMBAI
Mastering Exploratory Analysis with pandas
Copyright 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pavan Ramchandani
Acquisition Editor: Nelson Morris
Content Development Editor: Karan Thakkar
Technical Editor: Suwarna Patil
Copy Editor: Safis Editing
Project Coordinator: Nidhi Joshi
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Graphics: Jisha Chirayil
Production Coordinator: Arvindkumar Gupta
First published: September 2018
Production reference: 1290918
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78961-963-8
www.packtpub.com
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Packt.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.
At www.packt.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the author
Harish Garg is a data analyst, author, and software developer who is really passionate about data science and Python. He is a graduate of Udacity's Data Analyst Nanodegree program. He has 17 years of industry experience in data analysis using Python, developing and testing enterprise and consumer software, managing projects and software teams, and creating training material and tutorials. He also worked for 11 years for Intel Security (previously McAfee, Inc.). He regularly contributes articles and tutorials on data analysis and Python. He is also active in the open data community and is a contributing member of the Data4Democracy open data initiative. He has written data analysis pieces for the Takshashila think tank.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
In this book, you will be learning in depth about pandas, which is a Python library for manipulating, transforming, and analyzing data. It is a popular framework for exploratory data visualization, which is a method for analyzing datasets and data pipelines based on their properties.
This book will be your practical guide to exploring datasets using pandas. You will start by setting up Python, pandas, and Jupyter Notebooks. You will learn how to use Jupyter Notebooks to run Python code. We will then show you how to get data into pandas and perform some exploratory analysis. You will learn how to manipulate and reshape data using pandas methods. You will also learn how to deal with missing data from your datasets, how to draw charts and plots using pandas and Matplotlib, and how to create some effective visualizations for your audience. Finally, we will wrap up your newly gained pandas knowledge by teaching you how to get data out of pandas and into a number of popular file formats.
Who this book is for
This book is for the budding data scientist looking to learn about the popular pandas library, or the Python developer looking to step into the world of data analysisif you fall into either of those categories, then this book is the ideal resource for you to get started.
What this book covers
, Working with Different Kinds of Datasets , teaches you about u sing advanced options when reading data from CSV files and Excel files.
, Data Selection , looks at how to use the pandas series data structure to select data. You will also learn how to sort and filter data from pandas DataFrames and how to change datatypes in pandas series.
, Manipulating, Transforming, and Reshaping Data , explores how to modify pandas DataFrames. You will also learn how to use the GroupBy method, how to handle missing values, and how to index methods in pandas DataFrames. This chapter will also teach you how to work with dates and time data and how to apply functions to pandas series or DataFrames.
, Visualizing Data Like a Pro , will show you how to c ontrol plot aesthetics, including how to choose colors for plots. You will also learn how to plot categorical data and get to grips with plotting with data-aware grids.
To get the most out of this book
Some programming experience in Python would help you get the most out of this course.
Download the example code files
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
- Log in or register at www.packt.com.
Next page