A case study approach to gaining valuable insights from real data with machine learning
Data Science Projects with Python
second edition
Copyright 2021 Packt Publishing
All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Author: Stephen Klosterman
Reviewers: Ashish Jain and Deepti Miyan Gupta
Managing Editor: Mahesh Dhyani
Acquisitions Editors: Sneha Shinde and Anindya Sil
Production Editor: Shantanu Zagade
Editorial Board: Megan Carlisle, Mahesh Dhyani, Heather Gopsill, Manasa Kumar, Alex Mazonowicz, Monesh Mirpuri, Bridget Neale, Abhishek Rane, Brendan Rodrigues, Ankita Thakur, Nitesh Thakur, and Jonathan Wray
First published: April 2019
Second edition: July 2021
Production reference: 1280721
ISBN: 978-1-80056-448-0
Published by Packt Publishing Ltd.
Livery Place, 35 Livery Street
Birmingham B3 2PB, UK
Preface
About the Book
If data is the new oil, then machine learning is the drill. As companies gain access to ever-increasing quantities of raw data, the ability to deliver state-of-the-art predictive models that support business decision-making becomes more and more valuable.
In this book, you'll work on an end-to-end project based around a realistic data set and split up into bite-sized practical exercises. This creates a case-study approach that simulates the working conditions you'll experience in real-world data science projects.
You'll learn how to use key Python packages, including pandas, Matplotlib, and scikit-learn, and master the process of data exploration and data processing, before moving on to fitting, evaluating, and tuning algorithms such as regularized logistic regression and random forest.
Now in its second edition, this book will take you through the end-to-end process of exploring data and delivering machine learning models. Updated for 2021, this edition includes brand new content on XGBoost, SHAP values, algorithmic fairness, and the ethical concerns of deploying a model in the real world.
By the end of this data science book, you'll have the skills, understanding, and confidence to build your own machine learning models and gain insights from real data.
About the Author
Stephen Klosterman is a Machine Learning Data Scientist with a background in math, environmental science, and ecology. His education includes a Ph.D. in Biology from Harvard University, where he was an assistant teacher of the Data Science course. His professional experience includes work in the environmental, health care, and financial sectors. At work, he likes to research and develop machine learning solutions that create value, and that stakeholders understand. In his spare time, he enjoys running, biking, paddleboarding, and music.
Objectives
- Load, explore, and process data using the pandas Python package
- Use Matplotlib to create effective data visualizations
- Implement predictive machine learning models with scikit-learn and XGBoost
- Use lasso and ridge regression to reduce model overfitting
- Build ensemble models of decision trees, using random forest and gradient boosting
- Evaluate model performance and interpret model predictions
- Deliver valuable insights by making clear business recommendations
Audience
Data Science Projects with Python Second Edition is for anyone who wants to get started with data science and machine learning. If you're keen to advance your career by using data analysis and predictive modeling to generate business insights, then this book is the perfect place to begin. To quickly grasp the concepts covered, it is recommended that you have basic experience with programming in Python or another similar language (R, Matlab, C, etc). Additionally, knowledge of statistics that would be covered in a basic course, including topics such as probability and linear regression, or a willingness to learn about these on your own while reading this book would be useful.