Data Science from Scratch with Python
A Crash Course for Beginners to Learn Data Analysis, Programming and Machine Learning with Python
Steve Geddis
Table of Contents
Introduction
Congratulations on purchasing Data Science from Scratch with Python , and thank you for doing so.
Data comes in different forms, but at an advanced level, it exists in three major categories. That is structured, semi-structured, and unstructured. Data scientists are experts responsible for gathering, analyzing, and interpreting large amounts of data to help businesses and organizations. Throughout all the chapters in this book, you are going to learn what the best data scientists know about Data analytics, Machine learning, big data, data mining, and statistics. Since Data Science is a multidisciplinary field, this book covers every critical concept that you must know to become a Professional Data Scientist.
Machine Learning uses the basics of Python programming language. Python is easy to learn and supports compelling typing. Python programs are very natural in that it is easy to understand and read them (thanks to the exclusion of braces and semicolons). Python programming language can run on any computer platform, ranging from Linux to Windows to Solaris, and Macintosh etcetera. The simplicity nature of Python is what makes it accessible and a perfect choice for computer programmers.
The following points give a highlight of its characteristics:
It is a highly readable programming language
It has a clean visual layout
Less syntactic exceptions
It is perfect for scripting and rapid application
It supports dynamic and elegant typing
It is interpreted easily
It is compatible with numerous platforms
This book will explore the field of data science using data and its structure. Also, it will describe high-level processes that one uses to change data into value.
You know that Data Science is a process. However, this does not mean that it lacks creativity. When you move into stages of processing data, from analyzing data sources to machine learning, and finally, data visualization, you will start to see that complicated steps are involved in working with raw data.
Every effort was made to ensure it is full of as much useful information as possible, and please enjoy it!
Chapter 1: Basics of Data Science
The arrival of big data resulted in the expansion of storage space. As a result, storage became the biggest hurdle for most enterprises. Besides this, both organizations and enterprises required to build a framework and develop a solution to store data. Therefore, Hadoop and other frameworks were developed to solve this problem. Once this issue is resolved, the focus shifted to how data could be processed. When it comes to data processing, it is hard not to talk about Data Science. That is why it is essential to understand what Data Science is and how it can add value to a business. This chapter will take you through the definition of Data science and the role it plays in extracting essential insights from complex data.
Why Data Science is Crucial
Traditionally, data is structured in a small size. This means that there was no problem if you wanted to analyze data. Why? There were Simple BI tools that you could use to analyze data. But, recent data is unstructured and different from traditional data. Therefore, you need to have advanced methods of data analysis. The image below indicates that before the year 2020, more than eighty percent of the data will be unstructured.
This data comes from different sources, such as text files, financial logs, sensors, multimedia forms, and instruments. Simple BI tools cannot be used to process this kind of data as a result of the massive nature of data. For this reason, complex and advanced analytical tools and processing algorithms are required. These types of devices help data scientists analyze and draw essential insights from data.
There are still other reasons why Data Science has increasingly become popular. Lets take a look at how Data Science is applied in different domains.
Have you ever thought of having the ability to understand the exact requirements of your customers from existing data such as purchase history, past browsing history, income, and age? The truth is that it is now possible. There are different types of data that you can use to train models and accurately recommend several products to customers effectively.
Lets use a different example to demonstrate the role of Data Science in decision making. What if your car is intelligent enough to drive you home? That would be cool. Well, that is how the self-driving cars have been designed to work.
These cars gather live data from sensors to build a map of its surroundings. Based on this data, the vehicle can make decisions such as when to speed down, when to overtake, and when to take a turn. These cars have complex machine learning algorithms that analyze the data collected to develop a meaningful result.
Data Science is further applied in predictive analytics. This includes places such as weather forecasting, radars, and satellites. Models have been created that will not only forecast weather but also predict natural calamities. This helps an individual to take the right measures beforehand and save a lot of lives. The infographic presented below shows domains where Data Science is causing a significant impact.
Definition of Data Science
The term Data Science is common nowadays, but what does it mean? What skills does a person need to have to be called a Data Scientist? How are predictions and decisions made in Data Science? Is there a difference between Data Science and Business Intelligence? These are some of the questions that you are going to find answers to in a short while.
First, lets define Data Science.
Data Science refers to a combination of several tools, machine learning principles and algorithms whose purpose is to discover hidden patterns from raw data. One might wonder how different it is from statistics. The figure below has all the answers.
The figure above shows that a Data Analyst explains whatever is happening by processing the history of the data. On the other hand, a Data Scientist will explain how to extract insights from it, and they will also use different advanced machine learning algorithms to highlight the occurrence of a specific event in the future. A Data Scientist looks at the data from different perspectives and angles.
Therefore, Data Science helps an individual predict and make decisions by taking advantage of prescriptive analytics, machine learning, and predictive causal analytics.
Prescriptive analytics - If you need a model that has the intelligence and capability to make its own decisions, then prescriptive analytics is the best to use.
This new field delivers advice, it doesnt just predict, but it also recommends different prescribed actions and related outcomes. The best example to illustrate this is the Google self-driving car. Data that is collected by the vehicle is used to train cars. You can further mine this data by using algorithms to reveal intelligence. It will allow your vehicle to make decisions such as when to turn, which path to take, as well as when to speed up or slow down.
Machine learning for pattern discovery - Lets assume that you dont have resources that you can apply to make predictions; it will require you to determine the hidden patterns in the data set to predict correctly. The most popular algorithm used in pattern discovery is Clustering. Assume that you work in a telephone company, and you want to determine a network by installing towers in the region. Therefore, you may use the clustering technique to determine the tower location that will make sure all users have the maximum signal strength.