Hands-On Gradient Boosting with XGBoost and scikit-learn
Perform accessible machine learning and extreme gradient boosting with Python
Corey Wade
BIRMINGHAMMUMBAI
Hands-On Gradient Boosting with XGBoost and scikit-learn
Copyright 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Veena Pagare
Acquisition Editor: Ali Abidi
Senior Editor: David Sugarman
Content Development Editor: Tazeen Shaikh
Technical Editor: Sonam Pandey
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Priyanka Dhadke
Production Designer: Nilesh Mohite
First published: October 2020
Production reference: 1151020
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-83921-835-4
www.packt.com
To my sister, Anne. Thanks for recommending the bootcamp.
Corey
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the author
Corey Wade, M.S. Mathematics, M.F.A. Writing and Consciousness, is the founder and director of Berkeley Coding Academy, where he teaches machine learning and AI to teens from all over the world. Additionally, Corey chairs the Math Department at the Independent Study Program of Berkeley High School, where he teaches programming and advanced math. His additional experience includes teaching natural language processing with Hello World, developing data science curricula with Pathstream, and publishing original statistics (3NG) and machine learning articles with Towards Data Science, Springboard, and Medium. Corey is co-author of the Python Workshop, also published by Packt.
I want to thank the Packt team and my family, in particular Jetta and Josephine, for giving me the space and time to complete this book when life moved in unexpected directions, as it so often does.
Foreword
Over the last decade, Data Science has become a household term - data is the new oil, and machine learning is the new electricity. Virtually, every industry has grown leaps and bounds as the information age has transitioned into the data age. Academic departments all over the globe have sprung into action, applying and developing the techniques and discoveries for and from the data science playbook. In light of all of this development, there is a growing need for books (and authors) like this one.
More than just a moneymaker, machine learning shows great promise as a problem solver and a crucial tool in managing global crises. 2020 has been a year full of challenges, imploring machine learning to come to the aid of humanity. In California alone, over 4 million acres have burned from wildfires this year. Not to mention the COVID-19 pandemic, which to date has resulted in over 36 million cases and 1 million deaths worldwide (WorldMeter.info).
This book provides readers with practical training in one of the most exciting developments in machine learning: gradient boosting. Gradient boosting was the elegant answer to the foibles of the already magnanimous Random Forest algorithm and has proven to be a formidable asset in the Predictive Analytics toolbox. Moreover, Wade has chosen to focus on XGBoost, an extremely flexible and successful implementation thereof. In fact, in addition to having a serious presence in both industry and academia, XGBoost has consistently ranked as a top (quite possibly THE top) performing algorithm in data competitions based on structured tabular data containing numerical and categorical features.
As Hands-On Gradient Boosting with XGBoost and scikit-learn goes to print, author Corey Wade and his family are standing at ground zero, challenged by the acrid smokey breeze in the San Francisco Bay Area while practicing social distancing to avoid the novel coronavirus, COVID-19. This may be the perfect setting, albeit morbidly so, for motivating Wade to guide the next wave of problem solvers. He has put his heart and soul, as well as his intellect and grit, into researching and presenting what is quite likely the most complete source of information regarding the XGBoost implementation of Gradient Boosting.
Readers should know that they are benefitting not only from a great analyst and data scientist but also from an experienced and genuine teacher in Corey Wade. He has the bug, as we say in education: a passion to give, to help, and to disseminate critical knowledge to thirsting intellects.
Kevin Glynn
Data Scientist & Educator
About the reviewers
Andrew Greenwald holds an MSc in computer science from Drexel University and a BSc in electrical engineering with a minor in mathematics from Villanova University. He started his career designing solid-state circuits to test electronic components. For the past 25 years, he has been developing software for IT infrastructure, financial markets, and defense applications. He is currently applying machine learning to cybersecurity, developing models to detect zero-day malware. Andrew lives in Austin, Texas, with his wife and three sons.
Michael Bironneau is a mathematician and software engineer with a Ph.D. in mathematics from Loughborough University. He has been creating commercial and scientific software since the age of 11 when he first used the TI-BASIC programming language on his TI-82 graphing calculator to automate the math homework.