For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.
Manning Publications Co.
2022 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Mannings policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
front matter
preface
Like many data scientists and machine learning engineers out there, most of my professional training and education came from real-world experiences, rather than classical education. I got all my degrees from Johns Hopkins in theoretical mathematics and never once learned about regressions and classification models. Once I received my masters degree, I decided to make the switch from pursuing my PhD to going into startups in Silicon Valley and teaching myself the basics of ML and AI.
I used free online resources and read reference books to begin my data science education and started a company focusing on creating enterprise AIs for large corporations. Nearly all of the material I picked up focused on the types of models and algorithms used to model data and make predictions. I used books to learn the theory and read online posts on sites like Medium to see how people would apply that theory to real-life applications.
It wasnt until a few years later that I started to realize that I could only go so far learning about topics like models, training, and parameter tuning. I was working with raw text data at the time, building enterprise-grade chatbots, and I noticed a big difference in the tone of the books and articles about natural language processing (NLP). They focused a lot on the classification and regression models I could use, but they focused equally, if not even more, on how to process the raw text for the models to use. They talked about tuning parameters for the data more than tuning parameters for the models themselves.
I wondered why this wasnt the case for other branches of ML and AI. Why werent people transforming tabular data with the same rigor as text data? It couldnt be that it wasnt necessary or helpful because pretty much every survey asking about time spent in the data science process revealed that people spent a majority of time getting and cleaning data. I decided to take this gap and turn it into a book.
Funny enough, that wasnt this book. I wrote another book on feature engineering a few years prior to this one. My first book on feature engineering focused on the basics of feature engineering with an emphasis on explaining the tools and algorithms over showcasing how to use them day to day. This book takes a more practical approach. Every chapter in this book is dedicated to a use case in a particular field with a dataset that invites different feature engineering techniques to be used.
I tried to outline my own thinking process when it came to feature engineering in an easy-to-follow and concise format. Ive made a career out of data science and machine learning, and feature engineering has been a huge part of that. I hope that this book will open your eyes and your conversations with colleagues about working with data and give you the tools and tricks to know which feature engineering techniques to apply and when.
acknowledgments
This book required a lot of work, but I believe that all the time and effort resulted in a great book. I sure hope that you think so as well! There are many people Id like to thank for encouraging me and helping me along the way.
First and foremost, I want to thank my partner, Elizabeth. Youve supported me, listened to me as I paced around our kitchen trying to figure out the best analogy for a complex topic, and walked the dog when it was my turn, but I was so engrossed in my writing that it totally slipped my mind. I love you more than anything.
Next, Id like to acknowledge everyone at Manning who made this text possible. I know it took a while, but your constant support and belief in the topic kept me going when things were rough. Your commitment to the quality of this book has made it better for everyone who will read it.
Id also like to thank all the reviewers, who took the time to read my manuscript at various stages during its development. To Aleksei Agarkov, Alexander Klyanchin, Amaresh Rajasekharan, Bhagvan Kommadi, Bob Quintus, Harveen Singh, Igor Dudchenko, Jim Amrhein, Jiri Pik, John Williams, Joshua A. McAdams, Krzysztof Jdrzejewski, Krzysztof Kamyczek, Lavanya Mysuru Krishnamurthy, Lokesh Kumar, Maria Ana, Maxim Volgin, Mikael Dautrey, Oliver Korten, Prashant Nair, Richard Vaughan, Sadhana Ganapathiraju, Satej Kumar Sahu, Seongjin Kim, Sergio Govoni, Shaksham Kapoor, Shweta Mohan Joshi, Subhash Talluri, Swapna Yeleswarapu, and Vishwesh Ravi Shrimaland: your suggestions helped make this a better book.
Finally, a special thank you goes to my technical proofreaders, who made sure that I crossed my ts, dotted my is, and commented on my code!
All in all, many people made this book possible. Thank you all so much!