Preface & Table of Contents
0.1 Preface
Data Science is an emerging field. A large number of organizations are using it for research and business improvement. Glassdoor ranked data science as one of the best careers. There is an ever-growing demand for data scientists.
0.2 About this book
This book is for beginners and domain experts who want to start their Data Science journey. The book is precise and complete. It is one of the fastest ways to learn data science. It covers all the aspects of data science, graphing, and machine learning. As this is a beginner level book, prior knowledge is not needed. Knowing mathematics, statistics, and programming would be helpful. With this book, your data science journey will be like a hot knife on butter.
0.3 Book Links
Book Website - dswithr.ml (* .ml not .com)
Please review the book at Goodreads and
0.4 About the Author
Narayana Nemani
Narayana Nemani is a Lead Data Scientist. He worked in Microsoft technologies .net, rdlc, and UI technologies (angular). Finally, he settled in the data science field. He is involved in the teaching and research of data science.
Twitter Account
Email - dswithrml@gmail.com
Copyright
Published by Narayana Nemani
2021 Hyderabad
All rights reserved. No part of this book may be reproduced or modified in any form, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher.
The scanning, uploading, and distribution of this book via the internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions, and do not participate in or encourage piracy of copyrighted materials.
Getting Started
1.1 What is Data Science?
Data science is studying and using data. Analyzing and predicting data are the primary objectives of data science. It is an emerging field. Data science is a subset of artificial intelligence.
Use data science in all fields. Popular data science applications are self-driving cars, gaming AI, search engines (Google, DuckDuckGo), and virtual assistants (SIRI, Alexa).
Figure 1.1: Applications of Data Science
For example, internet apps generate a large amount of data. By understanding the generated data, we can interpret global trends, user-specific choices. Supermarkets can estimate the sales and fill up the inventory accordingly. The e-commerce applications recommend new products based on previous purchases.
Data science needs knowledge from various fields. Statistics, domain knowledge, and programming are the pillars of data science.
Figure 1.2: Pillars of Data Science
Data is the core of data science. Organizations internal storage, government data, private websites, and surveys are various sources of data. For example, news channels do a voting survey before elections. Data science applies to both small and large datasets.
Data science jobs are one of the highest-paid occupations. Both programmers and domain experts fill up these positions.
Models
Create models for understanding and predicting data. Models use existing data as input and predict outcomes of new data. Steps for creating a model are -
Understanding the problem statements
Transforming data (Data Wrangling)
Analyzing data (Exploratory Data Analysis)
Applying algorithms (Machine Learning or Deep Learning)
Creating models is an iterative process. After finding a new insight in a step, make relevant changes in other steps.
Figure 1.3: Steps of model creation
1.2 Editors
Rstudio
Rstudio is the recommended IDE for the R language. This ebook itself is written in Rstudio. It is installed locally by downloading the installer or accessed from the cloud on
The graphical interface has four areas. Each area has single or multiple panes.
Figure 1.4: RStudio
Source Code Editor pane - Write the actual code in the editor pane. The file extension of R script files is R. For creating a new script file in Rstudio, select the File > New File > R Script option. Save the code file for reusing it.
Console pane - Run code commands and view results of code execution at the console. It is a command-line interface. Use console for installing libraries, loading libraries.
In the below example, the date function is executed in the editor pane. The result is displayed in the console.
Figure 1.5: Editor Pane and Console Pane
Environment and History panes - The environment pane displays the variables currently loaded in memory and their values. The history pane shows the previously executed code commands.
Figure 1.6: Environment and History Panes
Files, Plots, Packages, Help panes - The files pane displays the files system. Use it for viewing and opening the code files. The plots pane displays the graphs. Packages and help panes are discussed in later chapters.
Frequently used keyboard shortcuts of Rstudio IDE are -
Crtl + Enter - Run current line or selected code
Crtl + Atl + r - Run entire document
Crtl + Shift + c - Comment/Uncomment current line or selected code
Crtl + l - Clear Console
Esc - Interrupt currently executing command
JupyterLab
JupyterLab is another popular IDE for data science projects. It supports many programming languages along with R and Python. The code in Jupiter contains blocks knows as cells.
Frequently used keyboard shortcuts of JupyterLab -
Crtl + Enter - Run current cell or selected cells
A - Insert cell above the current cell
B - Insert cell below the current cell
D, D - Delete current cell
Figure 1.7: JupyterLab
Google Colab
Google Colab is another popular option for R and Python. It is an implementation of Jupyter. It supports GPU and TPU acceleration.
Figure 1.8: Google Colab