Also by Kevin Clark
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming
Excel :The Ultimate Comprehensive Step-by-Step Guide to Strategies in Excel Programming (Formulas, Shortcuts and Spreadsheets)
Standalone
Excel : The Complete Ultimate Comprehensive Step-By-Step Guide To Learn Excel Programming
Hacking : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Ethical Hacking
Python For Data Science
Copyright 2019 by Kevin Clark All rights reserved.
This document is geared towards providing exact and reliable information in regards to the topic and issue covered. The publication is sold with the idea that the publisher is not required to render accounting, officially permitted or otherwise qualified services. If advice is necessary, legal or professional, a practiced individual in the profession should be ordered.
From a Declaration of Principles which was accepted and approved equally by a Committee of the American Bar Association and a Committee of Publishers and Associations.
In no way is it legal to reproduce, duplicate, or transmit any part of this document in either electronic means or in printed format. Recording of this publication is strictly prohibited, and any storage of this document is not allowed unless with written permission from the publisher. All rights reserved.
The information provided herein is stated to be truthful and consistent, in that any liability, in terms of inattention or otherwise, by any usage or abuse of any policies, processes, or directions contained within is the solitary and utter responsibility of the recipient reader. Under no circumstances will any legal responsibility or blame be held against the publisher for any reparation, damages, or monetary loss due to the information herein, either directly or indirectly.
Respective authors own all copyrights not held by the publisher.
The information herein is offered for informational purposes solely and is universal as so. The presentation of the information is without a contract or any type of guarantee assurance.
The trademarks that are used are without any consent, and the publication of the trademark is without permission or backing by the trademark owner. All trademarks and brands within this book are for clarifying purposes only and are owned by the owners themselves, not affiliated with this document.
Contents
D ata science is the discipline that combines ideas from Statistics and Computer Science to solve the problem of knowledge discovery in databases. In this partnership, Statistics has the role of providing the tools to describe, analyze, summarize, interpret, and make inferences about the data. In turn, Computer Science is concerned with providing efficient technologies for the storage, access, integration, and transformation of data.
That is, the role of Computer Science is to make feasible the analysis of databases, often complex and voluminous, through statistical processes. Among the different technologies used for scientific computing, Python is undoubtedly one of the most prominent. It is a free programming language, extremely versatile and powerful, which has been widely adopted in projects related to data science, both by industry and the academic community.
This book presents the fundamental concepts and techniques for those who wish to start working with Python for data science. The book covers the computational aspects of data science, which means that its main focus is to teach the reader how to develop programs capable of processing databases of different sizes, formats, and degrees of complexity.
The work is intended for all types of professionals involved with data science: biologists, mathematicians, engineers, chemists, administrators, physicists, statisticians, economists, etc., in short, anyone who wants to learn how to develop their own Python scripts to explore databases related to problems in their area of expertise.
Reiterating: the book is not only intended for people with backgrounds in computing but all human beings interested in Python for data science. It is also important to make it clear that the book does not focus on teaching statistics, machine learning, or data mining. In fact, what we intend is to teach the reader to program in the Python language, enabling him, in the future, to develop any type of script in this language, including programs that can analyze large databases through statistical methods or using algorithms for machine learning and data mining. In short, what we want is to make the reader a top-notch pythonist1!
No prerequisites are required for reading the book, although knowledge of some programming language - such as R, MATLAB, C, or even Excel's macro programming - certainly helps speed up the learning process. The book is divided into five very broad chapters. The first three cover the language's "rice and beans," that is, the least you need to know to start developing any kind of Python application. The following chapters deal with themes that are more directly related to data science.
- Chapter 1 - Pleasure to meet you, Python Language. It aims to present the Python environment and teach the reader how to create their first programs.
- Chapter 2 - Creating and Using Functions. Data analysis is always facilitated with the use of functions. In this chapter, you will discover how to use the basic mathematical and statistical functions of Python and also learn how to create your own reusable functions and modules.
- Chapter 3 - Native Data Structures. Data structures are used by programming languages to organize related data sets in memory to make their manipulation simpler and more efficient. This chapter presents the four native data structures of Python: lists, tuples, sets, and dictionaries.
- Chapter 4 - Strings and Databases in Text Format. Before statistical techniques can analyze data, it needs to be loaded into the Python environment. This chapter presents the basic techniques for importing text databases structured in different ways: CSV, JSON, column separated file, etc. In addition, the chapter presents the numerous word processing tools offered by Python, from simple string functions to regular expressions.
- Chapter 5 - SQL Database and Language. The main purpose of this chapter is to teach you how to query, combine, and explore tables stored in relational databases using the SQL language. Although SQL is more than 35 years old, it remains very relevant and is currently considered one of the key technologies in the area of data science.