Hands-On Data Science with Anaconda
Utilize the right mix of tools to create high-performance data science applications
Dr. Yuxing Yan
James Yan
BIRMINGHAM - MUMBAI
Hands-On Data Science with Anaconda
Copyright 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pravin Dhandre
Acquisition Editor: Divya Poojari
Content Development Editor: Dattatraya More
Technical Editor: Nirbhaya Shaji
Copy Editor: Safis Editing
Project Coordinator: Shweta H Birwatkar
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Graphics: Jisha Chirayil
Production Coordinator: Shantanu Zagade
First published: May 2018
Production reference: 1300518
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78883-119-2
www.packtpub.com
To my bosses: Mark Zaporowski (Canisius College), K.G. Viswanathan (Hofstra University), Lisa Fairchild (Loyola University), John Doe (Wharton School), David Ding (Nanyang Technological University), and Ben Amoako-Adu (Wilfrid Laurier University).
Yuxing Yan
To my dad, mom, and sister.
James Yan
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
PacktPub.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the authors
Dr. Yuxing Yan graduated from McGill University with a PhD in finance. He has taught various finance courses at eight universities in Canada, Singapore, and the U.S. He has published 23 research and teaching-related papers, and is the author of 6 books. Two of his recent publications are Python for Finance and Financial Modelling using R. He is well-versed in R, Python, SAS, MATLAB, Octave, and C. In addition, he is an expert on financial data analytics.
I thank Ben Amoako-Adu, Brian Smith, Jin-Chun Duan, Jerome Detemple, Lawrence Kryzanowski, Chris Schull, Mark Keintz, Dong Xu, Eric Zhu, Paul Ratnaraj, Premal Vora, Shuguang Zhang, Mireia Gine, Shaojun Zhang, Qian Sun, Shaobo Ji, Xing Zhang, Changwen Miao, Karyl Leggio, K. G. Viswanathan, Mark Lennon, Qiyu Zhang, Xiaoning (my wife), Jing (my daughter) and James (my son) for their help and support.
James Yan is an undergraduate student at the University of Toronto (UofT), currently double-majoring in computer science and statistics. He has hands-on knowledge of Python, R, Java, MATLAB , and SQL. During his study at UofT, he has taken many related courses, such as Methods of Data Analysis I and II, Methods of Applied Statistics, Introduction to Databases, Introduction to Artificial Intelligence, and Numerical Methods, including a capstone course on AI in clinical medicine.
About the reviewer
Justin (Byung Uk) Lee completed his BA and master's in computer science at KAIST. He developed Korean Windows CE 1.0 and 2.0 at Microsoft while working for LG Electronics. Later, he ran his own business for more than 7 years, which proposed custom-tailored financial portfolios derived from data analysis. He then worked for several life and non-life insurers, including Samsung Life as a CMO and CSMO conducting CRM-based marketing. Currently, he intensively researches machine learning based big data finance analysis and financial applications using blockchain.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
Anaconda is an open source data science platform that brings the best tools for data science together. It is a data science stack that includes more than 100 popular packages based on Python, Scala, and R. With the help of its package manager, conda, users can work with hundreds of packages in different languages and perform data preprocessing, modeling, clustering, classification, and validation with ease.
This book will get you started with Anaconda and how you can use it to perform data science operations in the real world. You will start of setting up the environment for the Anaconda platform, Jupyter, and installing the relevant packages. You will then cover the basics of data science and linear algebra for performing data science tasks. Once you are ready to go, you will start with data science operations such as cleaning, sorting, and data classification. You will then learn how to perform tasks such as clustering, regression, prediction, building machine learning models, and optimizing them. You will also learn how to visualize data and share the projects.
Next page