Dive Into Data Science
Use Python to Tackle Your Toughest Business Challenges
by Bradford Tuckfield
DIVE INTO DATA SCIENCE. Copyright 2023 by Bradford Tuckfield.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.
ISBN-13: 978-1-7185-0288-8 (print)
ISBN-13: 978-1-7185-0289-5 (ebook)
Publisher: William Pollock
Managing Editor: Jill Franklin
Production Manager: Sabrina Plomitallo-Gonzlez
Production Editor: Jennifer Kepler
Developmental Editor: Alex Freed
Cover Illustrator: Gina Redman
Interior Design: Octopod Studios
Technical Reviewer: Christian Ritter
Copyeditor: Sharon Wilkey
Compositor: Ashley McKevitt, Happenstance Type-O-Rama
Proofreader: Paula L. Fleming
Indexer: Emma Tuckfield
For information on distribution, bulk sales, corporate sales, or translations, please contact No Starch Press, Inc. directly at info@nostarch.com or:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 1.415.863.9900
www.nostarch.com
Library of Congress Cataloging-in-Publication Data
Names: Tuckfield, Bradford, author.
Title: Dive into data science : use Python to tackle your toughest business
challenges / by Bradford Tuckfield.
Description: San Francisco, CA : No Starch Press, [2023] | Includes index.
Identifiers: LCCN 2022051900 (print) | LCCN 2022051901 (ebook) | ISBN
9781718502888 (paperback) | ISBN 9781718502895 (ebook)
Subjects: LCSH: Business--Computer programs. | Python (Computer program
language)
Classification: LCC HF5548.5.P98 T83 2023 (print) | LCC HF5548.5.P98
(ebook) | DDC 650.0285--dc23/eng/20221102
LC record available at https://lccn.loc.gov/2022051900
LC ebook record available at https://lccn.loc.gov/2022051901
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an As Is basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
For Leah
About the Author
Bradford Tuckfield is a data scientist, consultant, and writer. He received a PhD in operations and information management from the Wharton School of the University of Pennsylvania and a BS in mathematics from Brigham Young University. He is the author of Dive Into Algorithms (No Starch Press, 2021) and co-author of Applied Unsupervised Learning with R (Packt, 2019). In addition to working as a data scientist and tech manager for top finance firms and startups, he has published his research in academic journals spanning math, business management, and medicine.
About the Technical Reviewer
As lead data scientist at Statistics Canada, Christian Ritter provided critical support to build the agencys data science division from the ground up, including the development of its data analytics platform. He has led projects leveraging natural language processing, computer vision, and recommender systems to serve a variety of clients. Christian is currently leading the agencys integration of MLOps. He is also the founder of OptimizeAI Consulting and works part-time as an independent data science consultant. When not taking on data science projects, he mentors students as part of postgraduate data science programs. Christian holds a PhD in computational astrophysics.
Acknowledgments
Many people made valuable contributions to this book. Professional mentors and colleagues helped me learn Python and data science and business and how all three can be put together. Included among these mentors are Seshu Edala and Dr. Sundaram Narayanan. Friends, including Sheng Lee, Ben Brown, Ee Chien Chua, and Drew Durtschi, have given valuable advice and encouragement that helped me during the writing process. Alex Freed at No Starch Press was unbelievably helpful throughout the process. Christian Ritter provided excellent suggestions and corrections as the technical reviewer. Emma Tuckfield provided excellent help in the editing process. Jayesh Thorat helped prepare much of the code and data in Chapter 8 . My dear grandma Dr. Virgie Day provided lifelong encouragement to my intellectual development, and also inspiration for some of the ideas in Chapter 4; that chapter is dedicated to her. This book is dedicated to Leah, who has been my most important source of support and motivation.
Introduction
Some years ago, Hal Varian, the chief economist at Google, confidently claimed that the sexy job in the next 10 years will be statisticians. In the years since he made that claim, two things have happened: weve started calling statisticians data scientists, and the profession has seen enormous growth, both in demand for skilled practitioners and in salaries.
The supply of skilled data scientists has not kept up with the demand. Part of the aim of this book is to help solve that problem by introducing you to all of the main data science techniques being used at todays top firms. Explanations come with working, thoroughly explained code for every example, and we also provide ideas about how various data science methods are applied and how to find creative solutions to challenges. The book is meant to give anyone who reads it the skills to become a data scientist and take on the toughest and most exciting challenges being faced by businesses today.
But data science is more than just a career opportunity. Its a broad field combining elements of statistics, software development, math, economics, and computer science. It allows you to analyze data, detect differences between groups, investigate causes of mysterious phenomena, classify species, and perform experimentsin other words, to do science. Anyone who gets excited about discovering the truth about something hard to understand, or who wants to understand the world better, should feel excited about this aspect of data science.
In short, data science can offer something to nearly everyone. It can help you solve business problems and make your business more successful. It can make you more of a scientist, better able to observe and clearly understand the world around you. It can sharpen your analytical abilities and your coding skills. Whats more, it can be fun. Becoming a data scientist means joining a field thats constantly growing and expanding, and this will mean that you need to expand your own knowledge and skills every day. If you feel up to the challenge of learning a wide range of difficult new skills that will help you work better, think better, and get a sexy job, read on.