Confident Data Skills
Confident Data Skills
Master the fundamentals of working with data and supercharge your career
Kirill Eremenko
Publishers note
Every possible effort has been made to ensure that the information contained in this book is accurate at the time of going to press, and the publishers and authors cannot accept responsibility for any errors or omissions, however caused. No responsibility for loss or damage occasioned to any person acting, or refraining from action, as a result of the material in this publication can be accepted by the editor, the publisher or the author.
First published in Great Britain and the United States in 2018 by Kogan Page Limited
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licences issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned addresses:
2nd Floor, 45 Gee Street
London
EC1V 3RS
United Kingdom
c/o Martin P Hill Consulting
122 W 27th Street
New York, NY 10001
USA
4737/23 Ansari Road
Daryaganj
New Delhi 110002
India
Kirill Eremenko 2018
The right of Kirill Eremenko to be identified as the author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
ISBN 978 0 7494 8154 4
E-ISBN 978 0 7494 8155 1
Typeset by Integra Software Services, Pondicherry
Print production managed by Jellyfish
Printed and bound in Great Britain by CPI Group (UK) Ltd, Croydon CR0 4YY
To my parents, Alexander and Elena Eremenko,
who taught me the most important thing in life how to be a good person
CONTENTS
Thank you for picking this book. Youve made a huge step in your journey into data science.
Please accept complimentary access to my Data Science AZ course.
Just go to www.superdatascience.com/bookbonus and use the password datarockstar.
Happy analysing!
I guess you always wanted to be a data scientist since you were little?
I find it sweet that people ask me this. Yes, I love my job. I take great pleasure in teaching students the fundamentals of data science. And its great that people seem to think that this enthusiasm for the subject can only have been something instilled in me at a young age. But this is absolutely not what happened. Lets be honest, no kid thinks about becoming a data scientist. Children want to be astronauts. Dancers. Doctors. Firefighters. And when youre busy dreaming about saving lives or shooting off into outer space, you cant be expected to have your feet on the ground.
When people ask me whether I had always wanted a career in data science, Im taken back to my childhood, a little Russian boy growing up in Zimbabwe. The scent of smouldering embers, the brassy calls of African red toads, the unforgettable softness of a winter evening, fingertips rubbing page against page of a collection of childrens stories these fragments of memories are from so many wonderful evenings listening to Russian tales read by my mother.
My mother wanted me and my siblings to love Zimbabwe, but she was equally concerned to ensure that we knew about our cultural background. She had considered how to best transmit this information to us, and decided that the most powerful way to do it was through stories. These nuggets of information about Russia, woven into compelling tales, meant that when I eventually moved back to Moscow to a country I barely remembered I felt that I was going home.
That is the power of storytelling. And for all those many tales I heard, I wanted to break them down into their components. I needed to see the big picture but I wanted to see it through the prism of all its little details. I was fascinated by all the nuts and bolts responsible for creating something so beautiful. I knew intuitively that, in order to tell a good story myself, I needed to first collect these little units of information. That, to me, is how I feel about data.
In todays Digital Age, data is used to shape the tales of who we are, how we present ourselves, what we enjoy and when we want things. To create a path of unique virtual footprints. As we shall discover in this book, machines now know more about us than we do ourselves because of all the data available to them. They read our personal data like it is a storybook about us. And the wonderful thing about data science is that every discipline these days records data, which means that, as data scientists, we can still be the astronauts and dancers and doctors we had always dreamed of becoming.
Few people know that being a data scientist ultimately means being the storyteller of information. Just as there are structural components to stories, data science projects are also arranged logically. Confident Data Skills addresses this through five clear stages, which I call the Data Science Process. This is not the only approach we can take for data science projects, but it is the method for ensuring that our project keeps building on practice and moving towards a logical conclusion. It has that clear, satisfying structure I so adored as a child.
This is how I learned to tell the story of data.
But Im a complete rookie
Data science is actually one of those areas that benefits from experience in a different field. It is my expectation that many readers will be professionals who are already relatively advanced in their career. Thats fine. You havent lost anything by coming to data science from another field. In fact, well done for getting a grounding in something else first. This is the kind of foundation you will need to become a good data scientist.
I am speaking from experience. When I started out at the multinational professional services firm Deloitte, I didnt know every single one of the algorithms that we will be looking at in this book. And it wasnt expected of me, either. There are very few people who will begin their careers in data science like that. As you read this book, you will find that a number of successful people in the industry did not even begin to think about the discipline until their career was well underway. So stash those fears of digital illiteracy away by picking up this book, you have taken the first step on your data science journey.
Hey, wheres the code?
If youre a book flipper like me, you may have noticed that theres not a single line of code in this book. But this is a book about data science, I hear you say, so whats going on? Data science is an extremely broad subject. Confident Data Skills will immerse you in it and inspire you to consider how the discipline can be incorporated into your current or future business practice. In these pages, youll learn its methods because its ingredients (the code) are easy to source online. To take the cooking analogy further, this is less a simple book of recipes and more a study of the basic techniques used in data science. Learn these thoroughly and youll start to intuitively understand