LitArk » Books » Computer

Joshua Cook [Joshua Cook] - Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server

Here you can read online Joshua Cook [Joshua Cook] - Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: Apress, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server
Author:
Joshua Cook Joshua Cook
Publisher:
Apress
Genre:
Books / Computer
Year:
2017
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Learn Docker infrastructure as code technology to define a system for performing standard but non-trivial data tasks on medium- to large-scale data sets, using Jupyter as the master controller.
It is not uncommon for a real-world data set to fail to be easily managed. The set may not fit well into access memory or may require prohibitively long processing. These are significant challenges to skilled software engineers and they can render the standard Jupyter system unusable.
As a solution to this problem, Docker for Data Science proposes using Docker. You will learn how to use existing pre-compiled public images created by the major open-source technologiesPython, Jupyter, Postgresas well as using the Dockerfile to extend these images to suit your specific purposes. The Docker-Compose technology is examined and you will learn how it can be used to build a linked system with Python churning data behind the scenes and Jupyter managing these background tasks. Best practices in using existing images are explored as well as developing your own images to deploy state-of-the-art machine learning and optimization algorithms.
What Youll Learn

Master interactive development using the Jupyter platform
Run and build Docker containers from scratch and from publicly available open-source images
Write infrastructure as code using the docker-compose tool and its docker-compose.yml file type
Deploy a multi-service data science application across a cloud-based system

Who This Book Is For
Data scientists, machine learning engineers, artificial intelligence researchers, Kagglers, and software developers

Joshua Cook [Joshua Cook]: author's other books

Who wrote Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server? Find out the surname, the name of the author of the book and a list of all author's works by series.

Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Joshua Cook 2017

Joshua Cook Docker for Data Science

10. Interactive Software Development

Joshua Cook 1

(1)

Santa Monica, California, USA

Developing software as a data scientist is different from traditional software engineering and far less understood. For the traditional software developer, For any language, framworks built around reuse, extensibility, and stability exist. The most famous of these might be the Rails framework for the Ruby language. Rails is written from the ground up around its adopted paradigm, the Model-View-Controller design pattern, a pattern heavily favored in the implementation of user-facing software. Listing shows the creation of and the default file structure for a new Rails application. Note that the new application has clear directories created for it based upon the usage pattern.

$ rails new myapp

create

...

$ tree -L 1 myapp/app

myapp/app/

assets

channels

controllers

helpers

jobs

mailers

models

views

Listing 10-1.

A Default Rails Application

Data science-specific software development has no such design pattern around which a similar framework might be built. In Chapter , I introduced the idea of interactive computing as an alternative to conventional programming. In this chapter, I propose that the idea of interactive computing itself be adopted as the cornerstone idea for a potential framework. Youll develop a project framework with infrastructure defined by a docker-compose.yml, built around Jupyter as your interactive computing driver. The goals of this framework are aligned with those of an interactive computing project. This framework should facilitate ease in

Iteration
Scaling and distribution of hardware
Sharing and documentation of work

A Quick Guide to Organizing Computational Biology Projects

For inspiration for this framework , lets look at the work of William Noble of the University of Washington. Nobles work describes one good strategy for carrying out computational experiments, focusing on relatively mundane issues such as organizing files directories and documenting progress.

Noble focuses on a few key principles to structuring a project:

File and directory organization
Documenting work
Executing work
Version control

Figure shows Nobles diagram for file and directory organization for a sample project called msms .

Figure 10-1.

Nobles sample project, msms

A Project Framework for Interactive Development

Youll draw directly upon this work to develop your framework. Youll use Jupyter Notebooks, numbered in sequence, as a method for both documenting and executing your work. These notebooks become a detailed record of activity as well as the means by which you drive this activity. Furthermore, youll present a directory hierarchy designed around the use of the Jupyter Notebook as the driver of your work. Figure shows a directory hierarchy built for interactive development.

Figure 10-2.

Directory hierachy built for interactive development

Youll build the directory hierarchy of your project using the following directories:

data
- Contains raw data files
docker
- Contains a subdirectory for each image to be defined using a build
- Each subdirectory will become the build context for the respective image
ipynb
- Contains all Jupyter Notebook files
- Replaces bin , doc , and results directories
- Notebooks are drivers, scripts, documentation, and presentation
- Notebooks are named with date and activity to sort them in place
lib
- Contains project-specific code modules, defined in the course of project development

Project Root Design Pattern

In Chapter , I proposed that

Jupyter doesnt replace vim, Sublime Text, or PyCharm. Jupyter replaces if __name__ == "__main__":.

The if __name__ == "__main__": design pattern provides a launch hook for running a Python program . The project framework I propose here is not built around running code in such a way, and as such does not require such a launch hook. Rather, you are building this framework around the Jupyter Notebook as a driver. What you require is a pattern for importing modules into your notebooks.

Maintaining a clean project directory structure requires you to keep your notebooks and your Python modules in separate directories. Furthermore, I hold that it is less aesthetic to nest one inside of the other. This causes a problem at import time. Given a directory structure as shown in Listing ) from lib/ directly into a Jupyter Notebook module in ipynb/ .

$ tree

ipynb

some_notebook.ipynb

lib

__init__.py

some_module.py

Listing 10-2.

Sample Project Structure

#!/bin/python

def say_hello ():

print("Hello!")

Listing 10-3.

A Demo Python Module, some_module.py

Lets solve this problem by using what I will refer to as the project root design pattern (Listing ).

In [1]: from os import chdir

chdir('/home/jovyan')

Listing 10-4.

The Project Root Design Pattern

The project root design pattern changes the current working directory of the Python kernel to be the root of the project. This is guaranteed by the configuration of the mounted volume in your docker-compose.yml file (Listing ).

In [2]: from lib.some_module import say_hello

say_hello()

Hello!

Listing 10-5.

Import from lib.some_module .

Initialize Project

In Chapter , you used a docker-compose.yml file to design an application consisting of a Jupyter Notebook Server and a PostgreSQL database . You used the docker-compose build tool and the design of the postgres image to gather your data and seed your database. Here, you do the same again, collecting your data from the UCI Machine Learning Repository. In this chapter, however, you formalize the process of gathering the data, documenting the process using a Jupyter Notebook.

In Listing file using the touch command. This has the effect of making the lib/ directory into a Python module. Finally, you initialize the project repository as a git repository using git init .

$ mkdir ch10_adult

$ cd ch10_adult/

$ mkdir docker ipynb lib

$ touch lib/__init__.py

$ git init

Initialized empty Git repository in /home/ubuntu/ch10_adult/.git/

Listing 10-6.

Initialize the ch10_adult Project

In Listing ), which will define the infrastructure of your project. Note that you start simple . At this phase, you only have a single service, a Jupyter Notebook Server .

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server»

Look at similar books to Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Geraldine Van der Auwera

Genomics in the Cloud: Using Docker, GATK, and WDL in Terra

Ken Cochrane

Docker Cookbook

Toomey

Learning Jupyter 5 explore interactive computing using Python, Java, JavaScript, R, Julia, and JupyterLab

Waud

Docker Quick Start Guide

Dan Toomey

Jupyter Cookbook: Over 75 recipes to perform interactive computing across Python, R, Scala, Spark, JavaScript, and more

Chelladhurai Jeeva S.

Learning Docker

Edwin M Sarmiento

The SQL Server Dbas Guide to Docker Containers: Agile Deployment Without Infrastructure Lock-in

Dan Toomey [Dan Toomey]

Jupyter for Data Science

Dan Toomey

Jupyter for Data Science

Dan Toomey

Jupyter Cookbook

Pethuru Raj

Learning Docker

Dan Toomey

Learning Jupyter

Reviews about «Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server»

Discussion, reviews of the book Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.