Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
A practical guide to implementing supervised and unsupervised machine learning algorithms in Python
Tarek Amr
BIRMINGHAM - MUMBAI
Hands-On Machine Learning with
scikit-learn and Scientific Python Toolkits
Copyright 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Mrinmayee Kawalkar
Acquisition Editor:Reshma Raman
Content Development Editor:Nazia Shaikh
Senior Editor: Ayaan Hoda
Technical Editor: Manikandan Kurup
Copy Editor:Safis Editing
Project Coordinator:Aishwarya Mohan
Proofreader: Safis Editing
Indexer:Pratik Shirodkar
Production Designer:Nilesh Mohite
First published: July 2020
Production reference: 1230720
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-83882-604-8
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the author
Tarek Amr has 8 years of experience in data science and machine learning. After finishing his postgraduate degree at the University of East Anglia, he worked in a number of start-ups and scale-up companies in Egypt and the Netherlands. This is his second data-related book. His previous book covered data visualization using D3.js. He enjoys giving talks and writing about different computer science and business concepts and explaining them to a wider audience. He can be reached on Twitter at @gr33ndata. He is happy to respond to all questions related to this book. Feel free to get in touch with him if any parts of the book need clarification or if you would like to discuss any of the concepts here in more detail.
I am grateful to a number of individuals who helped me build my technical knowledge and bridge the gap between the technical and the business sides of the equation. This list of individuals includes Khaled Fouad Elsayed, Amr Saad Ayad, Beatriz De La Iglesia, Dan Smith, Stephen Cox, Gilad Lotan, Karim Ratib, Peter Tegelaar, Adam Powell, Noel Kippers, and Mark Jager.
About the reviewers
Jamshaid Sohail is passionate about data science, machine learning, computer vision, and natural language processing and has over 2 years of experience in the industry. Currently, he is working as a data scientist at Systems Limited. He previously worked at a Silicon Valley-based start-up named FunnelBeam as a data scientist, working with the founders of the company from Stanford University. He has completed over 66 online courses on different platforms. He is an author of the course Data Wrangling with Python 3.X from Packt and has reviewed multiple books and courses. He is also developing a comprehensive course on data science at Educative and is in the process of writing books at multiple firms.
Prayson Wilfred Daniel bends Python, Bash, SQL, Cypher, JavaScript, Scala, Git, Docker, MLflow, and Airflow to make raw data tell their past, present, and future stories. Building business-driven innovative solutions with a strong focus on microservices architectures and taking into consideration DevOps is what he is passionate about. Prayson holds an MSc. in Information Technology and Persuasive Design from Aalborg University and seeks to help companies gain a competitive advantage from artificial intelligence, particularly machine learning.
Eugene Y. Chen is a machine learning engineer/researcher who wants to make the world a better place with smart software. When he is not building software, he enjoys thinking about and researching machine learning. He has published many peer-reviewed academic works, most recently at the KDD Workshop on Mining and Learning from Time Series on the topic of ensemble learning. He is a contributor to the scikit-learn project.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
You have already seen Harvard Business Review describing data science as the sexiest job of the 21st century. You have been watching terms such as machine learning and artificial intelligence pop up around you in the news all the time. You aspire to join this league of machine learning data scientists soon. Or maybe, you are already in the field but want to take your career to the next level. You want to learn more about the underlying statistical and mathematical theory, and apply this new knowledge using the most commonly used tool among practitioners, scikit-learn.
This book is here for you. It begins with an explanation of machine learning concepts and fundamentals and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms and shows you how to use them to solve real-life problems. You'll also learn various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you'll gain a thorough understanding of its theory and learn when to apply it to real-life problems.
Next page