Preface
The world generates data at an increasing pace. Consumers, sensors, or scientific experiments emit data points every day. In finance, business, administration and the natural or social sciences, working with data can make up a significant part of the job. Being able to efficiently work with small or large datasets has become a valuable skill. Python started as a general purpose language. Around ten years ago, in 2006, the first version of NumPy was released, which made Python a first class language for numerical computing and laid the foundation for a prospering development, which led to what we today call the PyData ecosystem: A growing set of high-performance libraries to be used in the sciences, finance, business or anywhere else you want to work efficiently with datasets. Python is not only about data analysis. The list of industrial-strength libraries for many general computing tasks is long, which makes working with data in Python even more compelling.
Social media and the Internet of Things have resulted in an avalanche of data. The data is powerful but not in its raw form; it needs to be processed and modeled and Python is one of the most robust tools we have out there to do so. It has an array of packages for predictive modeling and a suite of IDEs to choose from. Learning to predict who would win, lose, buy, lie, or die with Python is an indispensable skill set to have in this data age. This course is your guide to get started with Predictive Analytics using Python as the tool.
Data visualization is intended to provide information clearly and help the viewer understand them qualitatively. The well-known expression that a picture is worth a thousand words may be rephrased as a picture tells a story as well as a large collection of words. Visualization is, therefore, a very precious tool that helps the viewer understand a concept quickly. We are currently faced with a plethora of data containing many insights that hold the key to success in the modern day. It is important to find the data, clean it, and use the right tool to visualize it. This course explains several different ways to visualize data using Python packages, along with very useful examples in many different areas such as numerical computing, financial models, statistical and machine learning, and genetics and networks.
What this learning path covers
, Getting Started with Python Data Analysis starts with an introduction to data analysis and process, overview of libraries and its uses. Further youll dive right into the core of the PyData ecosystem by introducing the NumPy package for high-performance computing. We will also deal with a prominent and popular data analysis library for Python called Pandas and understand the data through graphical representation. Moving further you will see how to work with time-oriented data in Pandas. You will then learn to interact with three main categories: text formats, binary formats and databases and work on some application examples. In the end you will see the working of different scikit-learn modules.
,Learning Predictive Analytics with Python, talks about aspects, scope, and applications of predictive modeling. Data cleaning takes about 80% of the modelling time and hence we will understand its importance and methods. You will see how to subset, aggregate, sample, merge, append and concatenate a dataset. Further you will get acquainted with the basic statistics needed to make sense of the model parameters resulting from the predictive models. You will also understand the mathematics behind linear and logistic regression along with clustering. You will also deal with Decision trees and related classification algorithms. In the end you will be learning about the best practices adopted in the field of predictive modelling to get the optimum results.
Mastering Python Data Visualization, expounds that data visualization should actually be referred to as the visualization of information for knowledge inference. You will see how to use Anaconda from Continuum Analytics and learn interactive plotting methods. You will deal with stock quotes, regression analysis, the Monte Carlo algorithm, and simulation methods with examples. Further you will get acquainted with statistical methods such as linear and nonlinear regression and clustering and classification methods using numpy, scipy, matplotlib, and scikit-learn. You will use specific libraries such as graph-tool, NetworkX, matplotlib, scipy, and numpy. In the end we will see simulation methods and examples of signal processing to show several visualization methods.
What you need for this learning path
You will need a Python programming environment installed on your system. The first module uses a recent Python 2, but many examples will work with Python 3 as well.b The versions of the libraries used in the first module are: NumPy 1.9.2, Pandas 0.16.2, matplotlib 1.4.3, tables 3.2.2, pymongo 3.0.3, redis 2.10.3, and scikit-learn 0.16.1. As these packages are all hosted on PyPI, the Python package index, they can be easily installed with pip. To install NumPy, you would write:
$ pip install numpy If you are not using them already, we suggest you take a look at virtual environments for managing isolating Python environment on your computer. For Python 2, there are two packages of interest there: virtualenv and virtualenvwrapper. Since Python 3.3, there is a tool in the standard library called pyvenv (https://docs.python.org/3/library/venv.html), which serves the same purpose. Most libraries will have an attribute for the version, so if you already have a library installed, you can quickly check its version:
>>> import redis
>>> redis.__version__
2.10.3).
While all the examples in second module can be run interactively in a Python shell. We used IPython 4.0.0 with Python 2.7.10.
For the third module, you need Python 2.7.6 or a later version installed on your operating system. For the examples in this module, Mac OS X 10.10.5s Python default version (2.7.6) has been used. Install the prepackaged scientific Python distributions, such as Anaconda from Continuum or Enthought Python Distribution if possible