• Complain

Joshua Cook [Joshua Cook] - Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server

Here you can read online Joshua Cook [Joshua Cook] - Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: Apress, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Joshua Cook [Joshua Cook] Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server
  • Book:
    Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server
  • Author:
  • Publisher:
    Apress
  • Genre:
  • Year:
    2017
  • Rating:
    5 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 100
    • 1
    • 2
    • 3
    • 4
    • 5

Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Learn Docker infrastructure as code technology to define a system for performing standard but non-trivial data tasks on medium- to large-scale data sets, using Jupyter as the master controller.
It is not uncommon for a real-world data set to fail to be easily managed. The set may not fit well into access memory or may require prohibitively long processing. These are significant challenges to skilled software engineers and they can render the standard Jupyter system unusable.
As a solution to this problem, Docker for Data Science proposes using Docker. You will learn how to use existing pre-compiled public images created by the major open-source technologiesPython, Jupyter, Postgresas well as using the Dockerfile to extend these images to suit your specific purposes. The Docker-Compose technology is examined and you will learn how it can be used to build a linked system with Python churning data behind the scenes and Jupyter managing these background tasks. Best practices in using existing images are explored as well as developing your own images to deploy state-of-the-art machine learning and optimization algorithms.
What Youll Learn
  • Master interactive development using the Jupyter platform
  • Run and build Docker containers from scratch and from publicly available open-source images
  • Write infrastructure as code using the docker-compose tool and its docker-compose.yml file type
  • Deploy a multi-service data science application across a cloud-based system

Who This Book Is For
Data scientists, machine learning engineers, artificial intelligence researchers, Kagglers, and software developers

Joshua Cook [Joshua Cook]: author's other books


Who wrote Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server? Find out the surname, the name of the author of the book and a list of all author's works by series.

Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Joshua Cook 2017
Joshua Cook Docker for Data Science
10. Interactive Software Development
Joshua Cook 1
(1)
Santa Monica, California, USA
Developing software as a data scientist is different from traditional software engineering and far less understood. For the traditional software developer, For any language, framworks built around reuse, extensibility, and stability exist. The most famous of these might be the Rails framework for the Ruby language. Rails is written from the ground up around its adopted paradigm, the Model-View-Controller design pattern, a pattern heavily favored in the implementation of user-facing software. Listing shows the creation of and the default file structure for a new Rails application. Note that the new application has clear directories created for it based upon the usage pattern.
$ rails new myapp
create
...
$ tree -L 1 myapp/app
myapp/app/
assets
channels
controllers
helpers
jobs
mailers
models
views
Listing 10-1.
A Default Rails Application
Data science-specific software development has no such design pattern around which a similar framework might be built. In Chapter , I introduced the idea of interactive computing as an alternative to conventional programming. In this chapter, I propose that the idea of interactive computing itself be adopted as the cornerstone idea for a potential framework. Youll develop a project framework with infrastructure defined by a docker-compose.yml, built around Jupyter as your interactive computing driver. The goals of this framework are aligned with those of an interactive computing project. This framework should facilitate ease in
  • Iteration
  • Scaling and distribution of hardware
  • Sharing and documentation of work
A Quick Guide to Organizing Computational Biology Projects
For inspiration for this framework , lets look at the work of William Noble of the University of Washington. Nobles work describes one good strategy for carrying out computational experiments, focusing on relatively mundane issues such as organizing files directories and documenting progress.
Noble focuses on a few key principles to structuring a project:
  • File and directory organization
  • Documenting work
  • Executing work
  • Version control
Figure shows Nobles diagram for file and directory organization for a sample project called msms .
Figure 10-1 Nobles sample project msms A Project Framework for - photo 1
Figure 10-1.
Nobles sample project, msms
A Project Framework for Interactive Development
Youll draw directly upon this work to develop your framework. Youll use Jupyter Notebooks, numbered in sequence, as a method for both documenting and executing your work. These notebooks become a detailed record of activity as well as the means by which you drive this activity. Furthermore, youll present a directory hierarchy designed around the use of the Jupyter Notebook as the driver of your work. Figure shows a directory hierarchy built for interactive development.
Figure 10-2 Directory hierachy built for interactive development Youll - photo 2
Figure 10-2.
Directory hierachy built for interactive development
Youll build the directory hierarchy of your project using the following directories:
  • data
    • Contains raw data files
  • docker
    • Contains a subdirectory for each image to be defined using a build
    • Each subdirectory will become the build context for the respective image
  • ipynb
    • Contains all Jupyter Notebook files
    • Replaces bin , doc , and results directories
    • Notebooks are drivers, scripts, documentation, and presentation
    • Notebooks are named with date and activity to sort them in place
  • lib
    • Contains project-specific code modules, defined in the course of project development
Project Root Design Pattern
In Chapter , I proposed that
Jupyter doesnt replace vim, Sublime Text, or PyCharm. Jupyter replaces if __name__ == "__main__":.
The if __name__ == "__main__": design pattern provides a launch hook for running a Python program . The project framework I propose here is not built around running code in such a way, and as such does not require such a launch hook. Rather, you are building this framework around the Jupyter Notebook as a driver. What you require is a pattern for importing modules into your notebooks.
Maintaining a clean project directory structure requires you to keep your notebooks and your Python modules in separate directories. Furthermore, I hold that it is less aesthetic to nest one inside of the other. This causes a problem at import time. Given a directory structure as shown in Listing ) from lib/ directly into a Jupyter Notebook module in ipynb/ .
$ tree
.
ipynb
some_notebook.ipynb
lib
__init__.py
some_module.py
Listing 10-2.
Sample Project Structure
#!/bin/python
def say_hello ():
print("Hello!")
Listing 10-3.
A Demo Python Module, some_module.py
Lets solve this problem by using what I will refer to as the project root design pattern (Listing ).
In [1]: from os import chdir
chdir('/home/jovyan')
Listing 10-4.
The Project Root Design Pattern
The project root design pattern changes the current working directory of the Python kernel to be the root of the project. This is guaranteed by the configuration of the mounted volume in your docker-compose.yml file (Listing ).
In [2]: from lib.some_module import say_hello
say_hello()
Hello!
Listing 10-5.
Import from lib.some_module .
Initialize Project
In Chapter , you used a docker-compose.yml file to design an application consisting of a Jupyter Notebook Server and a PostgreSQL database . You used the docker-compose build tool and the design of the postgres image to gather your data and seed your database. Here, you do the same again, collecting your data from the UCI Machine Learning Repository. In this chapter, however, you formalize the process of gathering the data, documenting the process using a Jupyter Notebook.
In Listing file using the touch command. This has the effect of making the lib/ directory into a Python module. Finally, you initialize the project repository as a git repository using git init .
$ mkdir ch10_adult
$ cd ch10_adult/
$ mkdir docker ipynb lib
$ touch lib/__init__.py
$ git init
Initialized empty Git repository in /home/ubuntu/ch10_adult/.git/
Listing 10-6.
Initialize the ch10_adult Project
In Listing ), which will define the infrastructure of your project. Note that you start simple . At this phase, you only have a single service, a Jupyter Notebook Server .
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server»

Look at similar books to Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server»

Discussion, reviews of the book Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.