Statistics for
Machine Learning
Implement Statistical Methods used
in Machine Learning using Python
Himanshu Singh
www.bpbonline.com
FIRST EDITION 2021
Copyright BPB Publications, India
ISBN: 978-93-88511-97-1
All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means.
LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY
The information contained in this book is true to correct and the best of authors and publishers knowledge. The author has made every effort to ensure the accuracy of these publications, but publisher cannot be held responsible for any loss or damage arising from any information in this book.
All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information.
Distributors:
BPB PUBLICATIONS
20, Ansari Road, Darya Ganj
New Delhi-110002
Ph: 23254990/23254991
MICRO MEDIA
Shop No. 5, Mahendra Chambers,
DN Rd. Next to Capital Cinema,
V.T. (C.S.T.) Station, MUMBAI-400
Ph: 22078296/22078297
DECCAN AGENCIES
4-3-329, Bank Street,
Hyderabad-500195
Ph: 24756967/24756400
BPB BOOK CENTRE
Old Lajpat Rai Market,
Delhi-110006
Ph: 23861747
Published by Manish Jain for BPB Publications, Ansari Road, Darya Ganj, New Delhi-110002 and Printed by him at Repro India Ltd, Mumbai
www.bpbonline.com
Dedicated to
My Dad
Who never stopped believing in me,
even though he never expressed.
About the Author
Himanshu Singh is currently an AI technology lead and senior NLP developer at Legato Health Technologies (An Anthem Company). Himanshu has a total of years of experience, mostly in the domain of Natural Language Processing. He has written five books in the machine learning domain and is a guest faculty for machine learning and data science. Himanshu is an avid blogger and loves to read and write fiction short stories in his free time.
About the Reviewer
Aravind Kota is currently working as a data scientist. He has around 3+ years of experience in the field of data science, with specialization in image and text analytics and statistical operations with Python coding. He shares his knowledge in this field through blogs, and its important for readers to understand these concepts for further experiments.
Acknowledgements
First and foremost, I would like to thank my team. It is because of them that I got the opportunities to explore different problem statements, which has enabled me to write this book. I would especially like to thank Aravind, Bhavani, and Yunis sir.
I would also like to thank my students. Because of them, I came across all the doubts that they faced while understanding statistics. This, in turn, gave me ideas to approach this book in such a way that it clears the doubts of its readers.
Last but not least, I would like to thank my wife, Shikha. She has been a constant source of motivation for me, and without her, I would have never been able to finish the book.
Preface
This book can be considered a preliminary requirement before starting the machine learning journey in detail. One must understand that machine learning, in itself, is dependent on the concepts of statistics and mathematics. Statistical concepts are used in various areas of machine learning, like data exploration, finding the efficiency and efficacy of variables as well as models, and making visualizations. This book is designed in such a way that a reader can go through all the required concepts of statistics and then jump to understanding machine learning algorithms.
This book can be said to be having three sections. The first section starts with the basics of statistics. It covers preliminary concepts like mean, median, mode, and such and moves on to the concepts related to probability, random variables, and the like. The second section covers the complex parts of statistics, including advanced concepts like statistical tests, parametric and non-parametric tests and their applications in Python. Finally, the last section talks more about how to use various data science packages in Python and introduces readers to machine learning and some of its algorithms.
Downloading the
coloured images:
Please follow the link to download the
Coloured Images of the book:
https://rebrand.ly/vqukb
Errata
We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at :
errata@bpbonline.com
Your support, suggestions and feedbacks are highly appreciated by the BPB Publications Family.
Did you know that BPB offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.bpbonline.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at business@bpbonline.com for more details.
At you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks.
BPB is searching for authors like you
If you're interested in becoming an author for BPB, please visit www.bpbonline.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
The code bundle for the book is also hosted on GitHub at In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at Check them out!