Feature Engineering Made Easy
Identify unique features from your dataset in order to build powerful machine learning systems
Sinan Ozdemir
Divya Susarla
BIRMINGHAM - MUMBAI
Feature Engineering Made Easy
Copyright 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Veena Pagare
Acquisition Editor: Varsha Shetty
Content Development Editor: Tejas Limkar
Technical Editor: Sayli Nikalje
Copy Editor: Safis Editing
Project Coordinator: Manthan Patel
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Graphics: Tania Datta
Production Coordinator: Shantanu Zagade
First published: January 2018
Production reference: 1190118
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78728-760-0
www.packtpub.com
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
PacktPub.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the authors
Sinan Ozdemir is a data scientist, start-up founder, and educator living in the San Francisco Bay Area. He studied pure mathematics at Johns Hopkins University. He then spent several years conducting lectures on data science at Johns Hopkins University before founding his own start-up, Kylie.ai, which uses artificial intelligence to clone brand personalities and automate customer service communications.
Sinan is also the author of Principles of Data Science, available through Packt.
I would like to thank my parents and sister for supporting me throughout my life, and also my partner, Elizabeth Beutel. I also would like to thank my co-author, Divya Susarla, and Packt Publishing for all of their support.
Divya Susarla is an experienced leader in data methods, implementing and applying tactics across a range of industries and fields, such as investment management, social enterprise consulting, and wine marketing. She studied business economics and political science at the University of California, Irvine, USA.
Divya is currently focused on natural language processing and generation techniques at Kylie.ai, a start-up helping clients automate their customer support conversations.
I would like to thank my parents for their unwavering support and guidance, and also my partner, Neil Trivedi, for his patience and encouragement. Also, a shoutout to DSI-SF2; this book wouldn't be a reality without you all. Thanks to my co-author, Sinan Ozdemir, and to Packt Publishing for making this book possible.
About the reviewer
Michael Smith uses big data and machine learning to learn about how people behave. His experience includes IBM Watson and consulting for the US government. Michael actively publishes at and attends several prominent conferences as he engineers systems using text data and AI. He enjoys discussing technology and learning new ways to tackle problems.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
This book will cover the topic of feature engineering. A huge part of the data science and machine learning pipeline, feature engineering includes the ability to identify, clean, construct, and discover new characteristics of data for the purpose of interpretation and predictive analysis.
In this book, we will be covering the entire process of feature engineering, from inspection to visualization, transformation, and beyond. We will be using both basic and advanced mathematical measures to transform our data into a form that's much more digestible by machines and machine learning pipelines.
By discovering and transforming, we, as data scientists, will be able to gain a whole new perspective on our data, enhancing not only our algorithms but also our insights.
Who this book is for
This book is for people who are looking to understand and utilize the practices of feature engineering for machine learning and data exploration.
The reader should be fairly well acquainted with machine learning and coding in Python to feel comfortable diving into new topics with a step-by-step explanation of the basics.
What this book covers
, Introduction to Feature Engineering , is an introduction to the basic terminology of feature engineering and a quick look at the types of problems we will be solving throughout this book.
, Feature Understanding What's in My Dataset?, looks at the types of data we will encounter in the wild and how to deal with each one separately or together.
Next page