All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Published by Packt Publishing Ltd.
Birmingham B3 2PB, UK.
Credits
Authors
Luca Massaron
Alberto Boschetti
Reviewers
Giuliano Janson
Zacharias Voulgaris
Commissioning Editor
Kunal Parikh
Acquisition Editor
Sonali Vernekar
Content Development Editor
Siddhesh Salvi
Technical Editor
Shivani Kiran Mistry
Copy Editor
Stephen Copestake
Project Coordinator
Nidhi Joshi
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Disha Haria
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite
About the Authors
Luca Massaron is a data scientist and a marketing research director who is specialized in multivariate statistical analysis, machine learning, and customer insight with over a decade of experience in solving real-world problems and in generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms. From being a pioneer of Web audience analysis in Italy to achieving the rank of a top ten Kaggler, he has always been very passionate about everything regarding data and its analysis and also about demonstrating the potential of data-driven knowledge discovery to both experts and non-experts. Favoring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science just by doing the essentials.
I would like to thank Yukiko and Amelia for their support, help, and loving patience.
Alberto Boschetti is a data scientist, with an expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces daily challenges that span from natural language processing (NLP) and machine learning to distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
I would like to thank my family, my friends and my colleagues. Also, big thanks to the Open Source community.
About the Reviewers
Giuliano Janson's professional experiences have centered on advanced analytics and applied machine learning in healthcare. His work is primarily focused on extracting information from large, dirty, and noisy data utilizing machine learning, stats, Monte Carlo simulation, and data visualization to identify business opportunities and help leadership make data-driven decisions through actionable analytics.
I'd like to thank my wife, Magda, and my two beautiful children, Alex and Emily, for all the love they share.
Zacharias Voulgaris is a data scientist and technical author specializing in data science books. He has an engineering and management background, with post-graduate studies in information systems and machine learning. Zacharias has worked as a research fellow in Georgia Tech, investigating and applying machine learning technologies to real-world problems, as an SEO manager in an e-marketing company in Europe, as a program manager in Microsoft, and as a data scientist in US Bank and G2 Web Services.
Dr. Voulgaris has also authored technical books, the most notable of which is Data Scientist: The Definitive Guide to Becoming a Data Scientist , Technics Publications , and is currently working on Julia for Data Science , Manning Publications . He has also written a number of data science-related articles on blogs and participates in various data science/machine learning meet-up groups. Finally, he has provided technical editorial aid in the book Python Data Science Essentials , Packt Publishing , by the same authors as this book.
I would like to express my gratitude to the authors of the book for giving me the opportunity to contribute to this project. Also, I'd like to thank Bastiaan Sjardin for introducing me to them and to the world of technical editing.
www.PacktPub.com
eBooks, discount offers, and more
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at > for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
- Fully searchable across every book published by Packt
- Copy and paste, print, and bookmark content
- On demand and accessible via a web browser
Preface
"Frustra fit per plura, quod potest fieri per pauciora. (It is pointless to do with more what can be done with fewer)" |
-- William of Ockham (1285-1347) |
Linear models have been known to scholars and practitioners and studied by them for a long time now. Before they were adopted into data science and placed into the syllabi of numerous boot camps and in the early chapters of many practical how-to-do books, they have been a prominent and relevant element of the body of knowledge of statistics, economics, and of many other respectable quantitative fields of study.
Consequently, there is a vast availability of monographs, book chapters, and papers about linear regression, logistic regression (its classification variant), and the different types of generalized linear models; models where the original linear regression paradigm is adapted in its formulation in order to solve more complex problems.
Yet, in spite of such an embarrassment of riches, we have never encountered any book that really explains the speed and ease of implementation of such linear models when, as a developer or a data scientist, you have to quickly create an application or API whose response cannot be defined programmatically but it does have to learn from data.