LitArk » Books » Computer

Russell Jurney [Russell Jurney] - Agile Data Science 2.0

Here you can read online Russell Jurney [Russell Jurney] - Agile Data Science 2.0 full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: O’Reilly Media, Inc., genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Agile Data Science 2.0
Author:
Russell Jurney Russell Jurney
Publisher:
O’Reilly Media, Inc.
Genre:
Books / Computer
Year:
2017
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Agile Data Science 2.0: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Agile Data Science 2.0" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if theyre to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.

Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. Youll learn an iterative approach that lets you quickly change the kind of analysis youre doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.

Build value from your data in a series of agile sprints, using the data-value pyramid
Extract features for statistical models from a single dataset
Visualize data with charts, and expose different aspects through interactive reports
Use historical data to predict the future via classification and regression
Translate predictions into actions
Get feedback from users after each sprint to keep your project on track

Russell Jurney [Russell Jurney]: author's other books

Who wrote Agile Data Science 2.0? Find out the surname, the name of the author of the book and a list of all author's works by series.

Agile Data Science 2.0 — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Agile Data Science 2.0" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Appendix A. Manual Installation

In this appendix, we cover the details of installing the tools for the stack used in this book.

Installing Hadoop

You can download the latest version of Hadoop from the Apache Hadoop downloads page. At the time of writing, the latest Hadoop was 2.7.3, but this will probably have changed by the time youre reading this.

A recipe for a headless install of Hadoop is available in manual_install.sh. In addition to downloading and unpackaging Hadoop, we also need to set up our Hadoop environment variables (HADOOP_HOME, HADOOP_CLASSPATH, and HADOOP_CONF_DIR), and we need to put Hadoops executables in our PATH. First, set up a PROJECT_HOME variable to help find the right paths. You will need to set this yourself by editing your .bash_profile file:

exportPROJECT_HOME=/Users/rjurney/Software/Agile_Data_Code_2

Now we can set up our environment directly. Here is the relevant section of manual_install.sh:

# May need to update this link... see http://hadoop.apache.org/releases.htmlcurl -Lko /tmp/hadoop-2.7.3.tar.gz \ http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gzmkdir hadooptar -xvf /tmp/hadoop-2.7.3.tar.gz -C hadoop --strip-components=1echo '# Hadoop environment setup' >> ~/.bash_profileexport HADOOP_HOME=$PROJECT_HOME/hadoopecho 'export HADOOP_HOME=$PROJECT_HOME/hadoop' >> ~/.bash_profileexport PATH=$PATH:$HADOOP_HOME/binecho 'export PATH=$PATH:$HADOOP_HOME/bin' >> ~/.bash_profileexport HADOOP_CLASSPATH=$(hadoop classpath)echo 'export HADOOP_CLASSPATH=$(hadoop classpath)' >> ~/.bash_profileexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopecho 'export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop' >> ~/.bash_profile

Installing Spark

At the time of writing, the current version of Spark is 2.1.0. To install Spark on your local machine, follow the directions in the docs. Or, we perform a headless Spark install in manual_install.sh:

# May need to update this link... see http://spark.apache.org/downloads.htmlcurl -Lko /tmp/spark-2.1.0-bin-without-hadoop.tgz \ http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-without-hadoop.tgzmkdir sparktar -xvf /tmp/spark-2.1.0-bin-without-hadoop.tgz -C spark --strip-components=1echo "" >> ~/.bash_profileecho "# Spark environment setup" >> ~/.bash_profileexport SPARK_HOME=$PROJECT_HOME/sparkecho 'export SPARK_HOME=$PROJECT_HOME/spark' >> ~/.bash_profileexport HADOOP_CONF_DIR=$PROJECT_HOME/hadoop/etc/hadoop/echo 'export HADOOP_CONF_DIR=$PROJECT_HOME/hadoop/etc/hadoop/' >> ~/.bash_profileexport SPARK_DIST_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`echo 'export SPARK_DIST_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`' >> \ ~/.bash_profileexport PATH=$PATH:$SPARK_HOME/binecho 'export PATH=$PATH:$SPARK_HOME/bin' >> ~/.bash_profile# Have to set spark.io.compression.codec in Spark local modecp spark/conf/spark-defaults.conf.template spark/conf/spark-defaults.confecho 'spark.io.compression.codec org.apache.spark.io.SnappyCompressionCodec' >> spark/conf/spark-defaults.conf# Give Spark 8 GB of RAMecho "spark.driver.memory 8g" >> $SPARK_HOME/conf/spark-defaults.confecho "PYSPARK_PYTHON=python3" >> $SPARK_HOME/conf/spark-env.shecho "PYSPARK_DRIVER_PYTHON=python3" >> $SPARK_HOME/conf/spark-env.sh# Set up log4j config to reduce logging outputcp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.propertiessed -i .bak 's/INFO/ERROR/g' $SPARK_HOME/conf/log4j.properties

Note that this download URL may change; you can get the current URL for a console install from the Spark downloads page.

Installing MongoDB

Instructions for installing MongoDB are available on the website, as is an excellent tutorial. I recommend consulting each of these before moving on.

Download the latest version of MongoDB for your operating system from the download center, then install it using the following commands:

curl -Lko /tmp/$MONGO_FILENAME $MONGO_DOWNLOAD_URLmkdir mongodbtar -xvf /tmp/$MONGO_FILENAME -C mongodb --strip-components=1export PATH=$PATH:$PROJECT_HOME/mongodb/binecho 'export PATH=$PATH:$PROJECT_HOME/mongodb/bin' >> ~/.bash_profilemkdir -p mongodb/data/db

Now start the MongoDB server:

mongodb/bin/mongod --dbpath mongodb/data/db &

Youll need to rerun this command if you shut down your computer. Now open the Mongo shell, and get help:

mongob/bin/mongo --eval help

Finally, create a collection by inserting a record, and then retrieve it:

>db.test_collection.insert({'name':'Russell Jurney','email':'russell.jurney@gmail.com'})WriteResult({"nInserted":1})>db.test_collection.findOne({'name':'Russell Jurney'}){"_id":ObjectId("56f20fa811a5b44cf943313c"),"name":"Russell Jurney","email":"russell.jurney@gmail.com"}>

Were cooking with Mongo!

Installing the MongoDB Java Driver

Youll also need to install the MongoDB Java Driver. At the time of writing, the 3.4.2 version is the latest stable build. You can install it with curl as follows:

curl -Lko lib/mongo-java-driver-3.4.2.jar \ http://central.maven.org/maven2/org/mongodb/mongo-java-driver/3.4.0/ \ mongo-java-driver-3.4.0.jar

Installing mongo-hadoop

The mongo-hadoop project connects Hadoop and Spark with MongoDB. You can download it from the releases page.

Building mongo-hadoop

You will need to build the project, using the included gradlew command, and then copy the JARs into lib/:

# Install the mongo-hadoop project in the mongo-hadoop directory# in the root of our projectcurl -Lko /tmp/r1.5.2.tar.gz \ https://github.com/mongodb/mongo-hadoop/archive/r1.5.2.tar.gzmkdir mongo-hadooptar -xvzf /tmp/r1.5.2.tar.gz -C mongo-hadoop --strip-components=1# Now build the mongo-hadoop-spark jarscd mongo-hadoop./gradlew jarcd ..cp mongo-hadoop/spark/build/libs/mongo-hadoop-spark-*.jar lib/cp mongo-hadoop/build/libs/mongo-hadoop-*.jar lib/

Installing pymongo_spark

Next, we need to install the pymongo_spark package, which makes storing to Mongo a one-liner from PySpark. pymongo_spark is contained within the mongo-hadoop project:

# Now build the pymongo_spark packagecd mongo-hadoop/spark/src/main/pythonpython setup.py installcd$PROJECT_HOMEcp mongo-hadoop/spark/src/main/python/pymongo_spark.py lib/exportPYTHONPATH=$PYTHONPATH:$PROJECT_HOME/libecho'export PYTHONPATH=$PYTHONPATH:$PROJECT_HOME/lib' >> ~/.bash_profile

Installing Elasticsearch

Excellent tutorials on Elasticsearch are available on the website. Grab it from the downloads page, then install it with the following commands:

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Agile Data Science 2.0»

Look at similar books to Agile Data Science 2.0. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Data Science & Business Analytics

Russell Jurney

Practical Weak Supervision: Doing More with Less Data - Early Unedited Release

DasGupta

Practical Big Data Analytics: Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R

Lai Rudy

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

Cooper

Data science from scratch: the #1 data science guide for everything a data scientist needs to know: Python, linear algebra, statistics, coding, applications, neural networks, and decision trees

Jurney

Agile Data Science 2.0

Jurney

Big Data for Chimps

Hien Luu

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

EMC Education Services [EMC Education Services]

Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data

Srinivas Duvvuri

Spark for Data Science

Jake VanderPlas

Python Data Science Handbook: Essential Tools for Working with Data

Russell Jurney

Agile data science: building data analytics applications with Hadoop

Reviews about «Agile Data Science 2.0»

Discussion, reviews of the book Agile Data Science 2.0 and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.