Python Data Analysis for Newbies
Joshua K. Cage
1. Introduction
Thank you for picking up this book. This book is a beginner's introduction to data analysis using Python programming. This book is written for the following readers.
1) Interested in machine learning and deep learning
2) Interested in programming with Python.
3) Interested in data analysis.
4) Interested in using Numpy/Pandas/Matplotlib/ScikitLearn.
5) Not interested in building machine learning environments.
6) Not interested in spending a lot of money for learning.
7) Vaguely worried about the new corona epidemic and the future.
Many of my friends and acquaintances have started data analysis with a vengeance, only to be satisfied with the day-long process of setting up an environment, and then, after doing MNIST (handwritten numeric image data sets) and iris classification tutorials, they get busy with their day jobs and abandon it for a while.
This book uses the free Python execution environment provided by Google to run the tested source code in the book, allowing you to learn by doing programming with zero time to set up your own environment.
This book focuses on the bare minimum of knowledge needed to get a beginner into serious data analysis in Python. Our goal is that by the end of the book, readers will have reached the following five goals.
1) To build and train deep learning models and machine learning models from arbitrary data to be trained and predicted using deep learning libraries (keras) and machine learning libraries (scikit-learn).
2) To use Pandas instead of Excel for large scale data processing.
3) To manipulate multidimensional arrays using Numpy.
4) To draw graphs freely using Matplotlib.
5) To perform simple data analysis on the spread of new coronaviruses.
With the new coronavirus spreading around the world and the various reports in these times of uncertainty about the destination, many of you may not know what to believe and how to go about dealing with the situation.
One thing is for sure, there will be a noticeable difference in the skill sets of individuals depending on how they make use of the new free time created by telecommuting, and we are entering an era of clear winners and losers that will make a huge difference in their value in the company and in the labor market.
I believe that it is vital that we don't continue to hold on to vague fears in a state of anxiety, but rather that we transform each anxiety into a solvable problem through data analysis, one by one, so that each person can choose a course of action.
2. Disclaimer
The information contained in this document is for informational purposes only. Therefore, the use of this book is always at the reader's own risk and discretion. The use of the Google Colaboratory described in this book is at the reader's own risk after reviewing Google's Terms of Service and Privacy Policy.
In no event shall the author be liable for any consequential, incidental, or lost profits or other indirect damages, whether foreseen or foreseeable, arising out of or in connection with the use of the source code accompanying this book or the Google Colaboratory service.
You must accept the above precautions before using this book. The author will not be able to respond to inquiries without these precautions. Please be aware that the author will not be able to respond to your inquiry if you do not read these notes.
3. Trademarks and registered trademarks
All product names appearing in this manual are generally registered trademarks or trademarks of the respective companies. , and other marks may be omitted from the text.
4. Feedback
While the utmost care has been taken in the writing of this book, you may notice errors, inaccuracies, misleading or confusing language, or simple typographical errors and mistakes. In such cases, we would appreciate your feedback to the following address so that we can improve future editions. Suggestions for future revisions are also welcome. The contact information is below.
Joshua K. Cage
joshua.k.cage@gmail.com
5. Jupyter Notebook
The Jupyter Notebook, which allows you to run the code described in this book, is now available on Google Colaboratory. You can access it from the following link, so please refer to it when you read this book (Chrome is recommended*).
https://drive.google.com/file/d/1G7_YFCGMqV2bfTmR82pSwqLkSxMfhTDh/view?usp=sharing
6. GPU environment Google Colaboratory
Years ago, programming a Python program for data analysis required setting up a UNIX environment and compiling individual libraries, which was very time-consuming. Nowadays, however, Continuum Analytics offers Anaconda, a Python virtual environment for scientific computing that can be easily installed using an installer and a set of libraries. If you want to set up a Python environment on a local PC, such as a Windows or Mac, you can easily create a stand-alone Python environment using Anaconda.
However, Anaconda does have its problems. You need to have your own "local PC". Few novice users have a PC with sufficient specs to run real-world machine learning or deep learning simulations on at home. It wastes a lot of time. This book recommends the use of GPUs in the Google Colaboratory (Colab) environment.
Colab uses a tool called Jupyter Notebook, which is also included with Anaconda, to run Python from a web browser in the cloud (and it's free!). Colab ships standard with Pandas/Numpy/Matplotlib/Keras, which is used in this book. This is a great service that allows you to work on your machine learning projects anytime, anywhere, as long as you have an internet connection, even on a non-powered PC or tablet. With zero risk to get started and zero cost to set up, now that you've picked up a copy of this book you can run your Python programs in Colab and you'll be amazed at how easy it is to write and how quickly you can run deep learning Python programs on GPUs.
If you don't have a GMAIL account, you will need to create one by clicking on the link here . The following explanation goes on assuming that you already have a gmail account.
How to Setup Colab
(1) When you access GMAIL, in the upper right corner of the screen you will see a Bento Menu with nine squares, click on that and then click on the "Drive" icon.
(2) Press the "+ New" button at the bottom of the drive and select "More >" from the menu, then click Google Colaboratory" if it exists, otherwise choose Connect more apps".
(3) When "G Suite Marketplace" is displayed, click on the magnifying glass mark, and in the text box to search in the app, type "Colaboratory". Please click the "+" button at the bottom right of the logo, and then click the "Install" button on the screen that appears.
Please click the "Install" button on the screen displayed at the bottom right of the logo.