• Complain

Dan Toomey [Dan Toomey] - Jupyter for Data Science

Here you can read online Dan Toomey [Dan Toomey] - Jupyter for Data Science full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Dan Toomey [Dan Toomey] Jupyter for Data Science
  • Book:
    Jupyter for Data Science
  • Author:
  • Publisher:
    Packt Publishing
  • Genre:
  • Year:
    2017
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Jupyter for Data Science: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Jupyter for Data Science" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Your one-stop guide to building an efficient data science pipeline using Jupyter

About This Book

  • Get the most out of your Jupyter notebook to complete the trickiest of tasks in Data Science
  • Learn all the tasks in the data science pipelinefrom data acquisition to visualizationand implement them using Jupyter
  • Get ahead of the curve by mastering all the applications of Jupyter for data science with this unique and intuitive guide

Who This Book Is For

This book targets students and professionals who wish to master the use of Jupyter to perform a variety of data science tasks. Some programming experience with R or Python, and some basic understanding of Jupyter, is all you need to get started with this book.

What You Will Learn

  • Understand why Jupyter notebooks are a perfect fit for your data science tasks
  • Perform scientific computing and data analysis tasks with Jupyter
  • Interpret and explore different kinds of data visually with charts, histograms, and more
  • Extend SQLs capabilities with Jupyter notebooks
  • Combine the power of R and Python 3 with Jupyter to create dynamic notebooks
  • Create interactive dashboards and dynamic presentations
  • Master the best coding practices and deploy your Jupyter notebooks efficiently

In Detail

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook.

If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyters features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks.

By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.

Style and approach

This book is a perfect blend of concepts and practical examples, written in a way that is very easy to understand and implement. It follows a logical flow where you will be able to build on your understanding of the different Jupyter features with every chapter.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Dan Toomey [Dan Toomey]: author's other books


Who wrote Jupyter for Data Science? Find out the surname, the name of the author of the book and a list of all author's works by series.

Jupyter for Data Science — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Jupyter for Data Science" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Jupyter for Data Science
Exploratory analysis, statistical modeling, machine learning, and data visualization with Jupyter
Dan Toomey
BIRMINGHAM - MUMBAI Decision trees in Python We can perform the same analysis - photo 1

BIRMINGHAM - MUMBAI

Decision trees in Python

We can perform the same analysis in Python. Load a number of imports that are to be used:

import pandas as pd import numpy as np from os import system import graphviz #pip install graphviz from sklearn.cross_validation import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score from sklearn import tree

Read in the mpg data file:

carmpg = pd.read_csv("car-mpg.csv") carmpg.head(5)

Break up the data into factors and results columns carmpgcolumns mask - photo 2

Break up the data into factors and results:

columns = carmpg.columns mask = np.ones(columns.shape, dtype=bool) i = 0 #The specified column that you don't want to show mask[i] = 0 mask[7] = 0 #maker is a string X = carmpg[columns[mask]] Y = carmpg["mpg"]

Split up the data between training and testing sets:

X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)

Create a decision tree model:

clf_gini = tree.DecisionTreeClassifier(criterion = "gini", random_state = 100, max_depth=3, min_samples_leaf=5)

Calculate the model fit:

clf_gini.fit(X_train, y_train) DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_split=1e-07, min_samples_leaf=5, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=100, splitter='best')

Graph out the tree:

#I could not get this to work on a Windows machine #dot_data = tree.export_graphviz(clf_gini, out_file=None, # filled=True, rounded=True, # special_characters=True) #graph = graphviz.Source(dot_data) #graph
Using SciPy linear algebra in Jupyter

There is a complete set of linear algebra functions available. For example, we can solve a linear system with steps such as the following:

import numpy as np
from scipy import linalg

A = np.array([[1, 1], [2, 3]])
print ("A array")
print (A)

b = np.array([[1], [2]])
print ("b array")
print (b)

solution = np.linalg.solve(A, b)
print ("solution ")
print (solution)

# validate results
print ("validation of solution (should be a 0 matrix)")
print (A.dot(solution) b)

Here, the output under Jupyter looks like the following:

We validate the results with the final 0 matrix Using pandas in Jupyter - photo 3
We validate the results with the final 0 matrix.
Using pandas in Jupyter

pandas is an open source library of high-performance data analysis tools available in Python. Of particular interest are the functions to:

  • Read text files
  • Read Excel files
  • Read from SQL database
  • Operate on data frames
Combining datasets

So, we have seen moving a data frame into Spark for analysis. This appears to be very close to SQL tables. Under SQL it is standard practice not to reproduce items in different tables. For example, a product table might have the price and an order table would just reference the product table by product identifier, so as not to duplicate data. So, then another SQL practice is to join or combine the tables to come up with the full set of information needed. Keeping with the order analogy, we combine all of the tables involved as each table has pieces of data that are needed for the order to be complete.

How difficult would it be to create a set of tables and join them using Spark? We will use example tables of Product, Order, and ProductOrder:

Table

Columns

Product

Product ID,

Description,

Price

Order

Order ID,

Order Date

ProductOrder

Order ID,

Product ID,

Quantity

So, an Order has a list of Product/Quantity values associated.

We can populate the data frames and move them into Spark:

from pyspark import SparkContextfrom pyspark.sql import SparkSessionsc = SparkContext.getOrCreate()spark = SparkSession(sc)# load product setproductDF = spark.read.format("csv") \.option("header", "true") \.load("product.csv");productDF.show()productDF.createOrReplaceTempView("product")# load order setorderDF = spark.read.format("csv") \.option("header", "true") \.load("order.csv");orderDF.show()orderDF.createOrReplaceTempView("order")# load order/product setorderproductDF = spark.read.format("csv") \.option("header", "true") \.load("orderproduct.csv");orderproductDF.show()orderproductDF.createOrReplaceTempView("orderproduct")

Now, we can attempt to perform an SQL-like JOIN operation among them:

# join the tablesjoinedDF = spark.sql("SELECT * " \"FROM orderproduct " \"JOIN order ON order.orderid = orderproduct.orderid " \"ORDER BY order.orderid")joinedDF.show()

Doing all of this in Jupyter results in the display as follows:

Our standard imports obtain a SparkContext and initialize a SparkSession Note - photo 4

Our standard imports obtain a SparkContext and initialize a SparkSession. Note, the getOrCreate of the SparkContext. If you were to run this code outside of Jupyter there would be no context and a new one would be created. Under Jupyter, the startup for Spark in Jupyter initializes a context for all scripts. We can use that context at will with any Spark script, rather than have to create one ourselves.

Load our product table:

Load the order table Load the orderproduct table Note that at least one of - photo 5

Load the order table:

Load the orderproduct table Note that at least one of the orders has multiple - photo 6

Load the orderproduct table. Note that at least one of the orders has multiple products:

We have the orderid column from order and orderproduct in the result set We - photo 7
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Jupyter for Data Science»

Look at similar books to Jupyter for Data Science. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Jupyter for Data Science»

Discussion, reviews of the book Jupyter for Data Science and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.