• Complain

Rishi Yadav - Spark Cookbook

Here you can read online Rishi Yadav - Spark Cookbook full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2015, publisher: Packt Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Rishi Yadav Spark Cookbook
  • Book:
    Spark Cookbook
  • Author:
  • Publisher:
    Packt Publishing
  • Genre:
  • Year:
    2015
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Spark Cookbook: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Spark Cookbook" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Over 60 recipes on Spark, covering Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX libraries

About This Book
  • Become an expert at graph processing using GraphX
  • Use Apache Spark as your single big data compute platform and master its libraries
  • Learn with recipes that can be run on a single machine as well as on a production cluster of thousands of machines
Who This Book Is For

If you are a data engineer, an application developer, or a data scientist who would like to leverage the power of Apache Spark to get better insights from big data, then this is the book for you.

What You Will Learn
  • Install and configure Apache Spark with various cluster managers
  • Set up development environments
  • Perform interactive queries using Spark SQL
  • Get to grips with real-time streaming analytics using Spark Streaming
  • Master supervised learning and unsupervised learning using MLlib
  • Build a recommendation engine using MLlib
  • Develop a set of common applications or project types, and solutions that solve complex big data problems
  • Use Apache Spark as your single big data compute platform and master its libraries
In Detail

By introducing in-memory persistent storage, Apache Spark eliminates the need to store intermediate data in filesystems, thereby increasing processing speed by up to 100 times.

This book will focus on how to analyze large and complex sets of data. Starting with installing and configuring Apache Spark with various cluster managers, you will cover setting up development environments. You will then cover various recipes to perform interactive queries using Spark SQL and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will then focus on machine learning, including supervised learning, unsupervised learning, and recommendation engine algorithms. After mastering graph processing using GraphX, you will cover various recipes for cluster optimization and troubleshooting.

Rishi Yadav: author's other books


Who wrote Spark Cookbook? Find out the surname, the name of the author of the book and a list of all author's works by series.

Spark Cookbook — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Spark Cookbook" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Spark Cookbook

Spark Cookbook

Copyright 2015 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2015

Production reference: 1160715

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-706-1

www.packtpub.com

Cover image by: InfoObjects design team

Credits

Author

Rishi Yadav

Reviewers

Thomas W. Dinsmore

Cheng Lian

Amir Sedighi

Commissioning Editor

Kunal Parikh

Acquisition Editors

Shaon Basu

Neha Nagwekar

Content Development Editor

Ritika Singh

Technical Editor

Ankita Thakur

Copy Editors

Ameesha Smith-Green

Swati Priya

Project Coordinator

Milton Dsouza

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Graphics

Sheetal Aute

Production Coordinator

Nilesh R. Mohite

Cover Work

Nilesh R. Mohite

About the Author

Rishi Yadav has 17 years of experience in designing and developing enterprise applications. He is an open source software expert and advises American companies on big data trends. Rishi was honored as one of Silicon Valley's 40 under 40 in 2014. He finished his bachelor's degree at the prestigious Indian Institute of Technology (IIT) Delhi in 1998.

About 10 years ago, Rishi started InfoObjects, a company that helps data-driven businesses gain new insights into data.

InfoObjects combines the power of open source and big data to solve business challenges for its clients and has a special focus on Apache Spark. The company has been on the Inc. 5000 list of the fastest growing companies for 4 years in a row. InfoObjects has also been awarded with the #1 best place to work in the Bay Area in 2014 and 2015.

Rishi is an open source contributor and active blogger.

My special thanks go to my better half, Anjali, for putting up with the long, arduous hours that were added to my already swamped schedule; our 8 year old son, Vedant, who tracked my progress on a daily basis; InfoObjects' CTO and my business partner, Sudhir Jangir, for leading the big data effort in the company; Helma Zargarian, Yogesh Chandani, Animesh Chauhan, and Katie Nelson for running operations skillfully so that I could focus on this book; and our internal review team, especially Arivoli Tirouvingadame, Lalit Shravage, and Sanjay Shroff, for helping with the review. I could not have written without your support. I would also like to thank Marcel Izumi for putting together amazing graphics.

About the Reviewers

Thomas W. Dinsmore is an independent consultant, offering product advisory services to analytic software vendors. To this role, he brings 30 years of experience, delivering analytics solutions to enterprises around the world. He uniquely combines hands-on analytics experience with the ability to lead analytic projects and interpret results.

Thomas' previous services include roles with SAS, IBM, The Boston Consulting Group, PricewaterhouseCoopers, and Oliver Wyman.

Thomas coauthored Modern Analytics Methodologies and Advanced Analytics Methodologies , published in 2014 by Pearson FT Press, and is under contract for a forthcoming book on business analytics from Apress. He publishes The Big Analytics Blog at www.thomaswdinsmore.com.

I would like to thank the entire editorial and production team at Packt Publishing, who work tirelessly to bring out quality books to the public.

Cheng Lian is a Chinese software engineer and Apache Spark committer from Databricks. His major technical interests include big data analytics, distributed systems, and functional programming languages.

Cheng is also the translator of the Chinese edition of Erlang and OTP in Action and Concurrent Programming in Erlang (Part I) .

I would like to thank Yi Tian from AsiaInfo for helping me review some parts of , Getting Started with Machine Learning Using MLlib .

Amir Sedighi is an experienced software engineer, a keen learner, and a creative problem solver. His experience spans a wide range of software development areas, including cross-platform development, big data processing and data streaming, information retrieval, and machine learning. He is a big data lecturer and expert, working in Iran. He holds a bachelor's and master's degree in software engineering. Amir is currently the CEO of Rayanesh Dadegan Ekbatan, the company he cofounded in 2013 after several years of designing and implementing distributed big data and data streaming solutions for private sector companies.

I would like to thank the entire team at Packt Publishing, who work hard to bring awesomeness to the books and the readers' professional life.

www.PacktPub.com
Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at > for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

httpswww2packtpubcombookssubscriptionpacktlib Do you need instant - photo 1

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why Subscribe?
  • Fully searchable across every book published by Packt
  • Copy and paste, print, and bookmark content
  • On demand and accessible via a web browser
Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

The success of Hadoop as a big data platform raised user expectations, both in terms of solving different analytics challenges as well as reducing latency. Various tools evolved over time, but when Apache Spark came, it provided one single runtime to address all these challenges. It eliminated the need to combine multiple tools with their own challenges and learning curves. By using memory for persistent storage besides compute, Apache Spark eliminates the need to store intermedia data in disk and increases processing speed up to 100 times. It also provides a single runtime, which addresses various analytics needs such as machine-learning and real-time streaming using various libraries.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Spark Cookbook»

Look at similar books to Spark Cookbook. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Spark Cookbook»

Discussion, reviews of the book Spark Cookbook and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.