LitArk » Books » Computer

Sandy Ryza - Advanced Analytics with Spark

Here you can read online Sandy Ryza - Advanced Analytics with Spark full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: OReilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Advanced Analytics with Spark
Author:
Sandy Ryza
Publisher:
OReilly Media
Genre:
Books / Computer
Year:
2017
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Advanced Analytics with Spark: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Advanced Analytics with Spark" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming.

Youll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniquesincluding classification, clustering, collaborative filtering, and anomaly detectionto fields such as genomics, security, and finance.

If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, youll find the books patterns useful for working on your own data applications.

With this book, you will:

Familiarize yourself with the Spark...

Sandy Ryza: author's other books

Who wrote Advanced Analytics with Spark? Find out the surname, the name of the author of the book and a list of all author's works by series.

Advanced Analytics with Spark — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Advanced Analytics with Spark" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Advanced Analytics with Spark

by Sandy Ryza , Uri Laserson , Sean Owen , and Josh Wills

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

Editor: Marie Beaugureau
Production Editor: Melanie Yarbrough
Copyeditor: Gillian McGarvey
Proofreader: Christina Edwards
Indexer: WordCo Indexing Services
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest

June 2017: Second Edition

Revision History for the Second Edition

2017-06-09: First Release

The OReilly logo is a registered trademark of OReilly Media, Inc. Advanced Analytics with Spark, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-97295-3

[LSI]

Foreword

Ever since we started the Spark project at Berkeley, Ive been excited about not just building fast parallel systems, but helping more and more people make use of large-scale computing. This is why Im very happy to see this book, written by four experts in data science, on advanced analytics with Spark. Sandy, Uri, Sean, and Josh have been working with Spark for a while, and have put together a great collection of content with equal parts explanations and examples.

The thing I like most about this book is its focus on examples, which are all drawn from real applications on real-world data sets. Its hard to find one, let alone 10, examples that cover big data and that you can run on your laptop, but the authors have managed to create such a collection and set everything up so you can run them in Spark. Moreover, the authors cover not just the core algorithms, but the intricacies of data preparation and model tuning that are needed to really get good results. You should be able to take the concepts in these examples and directly apply them to your own problems.

Big data processing is undoubtedly one of the most exciting areas in computing today, and remains an area of fast evolution and introduction of new ideas. I hope that this book helps you get started in this exciting new field.

Matei Zaharia, CTO at Databricks and Vice President, Apache Spark

Preface

Sandy Ryza

I dont like to think I have many regrets, but its hard to believe anything good came out of a particular lazy moment in 2011 when I was looking into how to best distribute tough discrete optimization problems over clusters of computers. My advisor explained this newfangled Apache Spark thing he had heard of, and I basically wrote off the concept as too good to be true and promptly got back to writing my undergrad thesis in MapReduce. Since then, Spark and I have both matured a bit, but only one of us has seen a meteoric rise thats nearly impossible to avoid making ignite puns about. Cut to a few years later, and it has become crystal clear that Spark is something worth paying attention to.

Sparks long lineage of predecessors, from MPI to MapReduce, makes it possible to write programs that take advantage of massive resources while abstracting away the nitty-gritty details of distributed systems. As much as data processing needs have motivated the development of these frameworks, in a way the field of big data has become so related to these frameworks that its scope is defined by what these frameworks can handle. Sparks promise is to take this a little furtherto make writing distributed programs feel like writing regular programs.

Spark is great at giving ETL pipelines huge boosts in performance and easing some of the pain that feeds the MapReduce programmers daily chant of despair (why? whyyyyy?) to the Apache Hadoop gods. But the exciting thing for me about it has always been what it opens up for complex analytics. With a paradigm that supports iterative algorithms and interactive exploration, Spark is finally an open source framework that allows a data scientist to be productive with large data sets.

I think the best way to teach data science is by example. To that end, my colleagues and I have put together a book of applications, trying to touch on the interactions between the most common algorithms, data sets, and design patterns in large-scale analytics. This book isnt meant to be read cover to cover. Page to a chapter that looks like something youre trying to accomplish, or that simply ignites your interest.

Whats in This Book

The first chapter will place Spark within the wider context of data science and big data analytics. After that, each chapter will comprise a self-contained analysis using Spark. The second chapter will introduce the basics of data processing in Spark and Scala through a use case in data cleansing. The next few chapters will delve into the meat and potatoes of machine learning with Spark, applying some of the most common algorithms in canonical applications. The remaining chapters are a bit more of a grab bag and apply Spark in slightly more exotic applicationsfor example, querying Wikipedia through latent semantic relationships in the text or analyzing genomics data.

The Second Edition

Since the first edition, Spark has experienced a major version upgrade that instated an entirely new core API and sweeping changes in subcomponents like MLlib and Spark SQL. In the second edition, weve made major renovations to the example code and brought the materials up to date with Sparks new best practices.

Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/sryza/aas.

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title,author, publisher, and ISBN. For example: "Advanced Analytics with Spark by Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills (OReilly). Copyright 2015 Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills,978-1-491-91276-8.

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Advanced Analytics with Spark»

Look at similar books to Advanced Analytics with Spark. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Jeffrey Aven

Data Analytics with Spark Using Python

Akash Tandon

Advanced Analytics with PySpark: Patterns for Learning from Data at Scale Using Python and Spark

Mahmoud Parsian

Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark

Pentreath

Machine learning with Spark: create scalable machine learning applications to power a modern data-driven business using Spark

Laserson Uri

Advanced Analytics with Spark

Jules S. Damji

Learning Spark: Lightning-Fast Data Analytics

Romeo Kienzler

Apache Spark 2: Data Processing and Real-Time Analytics: Master complex big data processing, stream analytics, and machine learning with Apache Spark

Javier Luraschi

Luraschi, J: Mastering Spark with R

Hien Luu

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Srinivas Duvvuri

Spark for Data Science

Rajanarayanan Thottuvaikkatumana

Apache Spark 2 for Beginners

Siamak Amirghodsi

Apache Spark 2.x Machine Learning Cookbook

Reviews about «Advanced Analytics with Spark»

Discussion, reviews of the book Advanced Analytics with Spark and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.