LitArk » Books » Computer

Frampton Mike - Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark

Here you can read online Frampton Mike - Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Birmingham;UK, year: 2015, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark
Author:
Frampton Mike / Szymanski Andrew
Publisher:
Packt Publishing
Genre:
Books / Computer
Year:
2015
City:
Birmingham;UK
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Gain expertise in processing and storing data by using advanced techniques with Apache Spark

About This Book

Explore the integration of Apache Spark with third party applications such as H20, Databricks and Titan

Evaluate how Cassandra and Hbase can be used for storage
An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark functionalities
Who This Book Is For
If you are a developer with some experience with Spark and want to strengthen your knowledge of how to get around in the world of Spark, then this book is ideal for you. Basic knowledge of Linux, Hadoop and Spark is assumed. Reasonable knowledge of Scala is expected.
What You Will Learn
Extend the tools available for processing and storage
Examine clustering and classification using MLlib
Discover Spark stream processing via Flume, HDFS
Create a schema in Spark SQL, and learn how a Spark schema can be populated...

Frampton Mike: author's other books

Who wrote Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark? Find out the surname, the name of the author of the book and a list of all author's works by series.

Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Mastering Apache Spark

Mastering Apache Spark

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: September 2015

Production reference: 1280915

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-714-6

www.packtpub.com

Credits

Author

Mike Frampton

Reviewers

Andrea Mostosi

Toni Verbeiren

Lijie Xu

Commissioning Editor

Kunal Parikh

Acquisition Editor

Nadeem Bagban

Content Development Editor

Riddhi Tuljapurkar

Technical Editor

Rupali R. Shrawane

Copy Editor

Yesha Gangani

Project Coordinator

Kinjal Bari

Proofreader

Safis Editing

Indexer

Rekha Nair

Graphics

Jason Monteiro

Production Coordinator

Manu Joseph

Cover Work

Manu Joseph

Foreword

Big data is getting bigger and bigger day by day. And I don't mean tera, peta, exa, zetta, and yotta bytes of data collected all over the world every day. I refer to complexity and number of components utilized in any decent and respectable big data ecosystem. Never mind the technical nitties grittiesjust keeping up with terminologies, new buzzwords, and hypes popping up all the time can be a real challenge in itself. By the time you have mastered them all, and put your hard-earned knowledge to practice, you will discover that half of them are old and inefficient, and nobody uses them anymore. Spark is not one of those "here today, gone tomorrow" fads. Spark is here to stay with us for the foreseeable future, and it is well worth to get your teeth into it in order to get some value out of your data NOW, rather than in some, errr, unforeseeable future. Spark and the technologies built on top of it are the next crucial step in the big data evolution. They offer 100x faster in-memory, and 10x on disk processing speeds in comparison to the traditional Hadoop jobs.

There's no better way of getting to know Spark than by reading this book, written by Mike Frampton, a colleague of mine, whom I first met many, many years ago and have kept in touch ever since. Mike's main professional interest has always been data and in pre-big data days, he worked on data warehousing, processing, and analyzing projects for major corporations. He experienced the inefficiencies, poor value, and frustrations that the traditional methodologies of crunching the data offer first hand. So understanding big data, what it offers, where it is coming from, and where it is heading, and is intrinsically intuitive to him. Mike wholeheartedly embraced big data the moment it arrived, and has been devoted to it ever since. He practices what he preaches, and is not in it for money. He is very active in the big data community, writes books, produces presentations on SlideShare and YouTube, and is always first to test-drive the new, emerging products.

Mike's passion for big data, as you will find out, is highly infectious, and he is always one step ahead, exploring the new and innovative ways big data is used for. No wonder that in this book, he will teach you how to use Spark in conjunction with the very latest technologies; some of them are still in development stage, such as machine learning and Neural Network. But fear not, Mike will carefully guide you step by step, ensuring that you will have a direct, personal experience of the power and usefulness of these technologies, and are able to put them in practice immediately.

Andrew Szymanski

Cloudera Certified Hadoop Administrator/Big Data Specialist

About the Author

Mike Frampton is an IT contractor, blogger, and IT author with a keen interest in new technology and big data. He has worked in the IT industry since 1990 in a range of roles (tester, developer, support, and author). He has also worked in many other sectors (energy, banking, telecoms, and insurance). He now lives by the beach in Paraparaumu, New Zealand, with his wife and teenage son. Being married to a Thai national, he divides his time between Paraparaumu and their house in Roi Et, Thailand, between writing and IT consulting. He is always keen to hear about new ideas and technologies in the areas of big data, AI, IT and hardware, so look him up on LinkedIn (http://linkedin.com/profile/view?id=73219349) or his website (http://www.semtech-solutions.co.nz/#!/pageHome) to ask questions or just to say hi.

I would like to acknowledge the efforts of the open source development community who offer their time, expertise and services in order to help develop projects like Apache Spark. I have been continuously impressed by the speed with which this development takes place, which seems to exceed commercial projects. I would also like to mention the communities that grow around open source products, and people who answer technical questions and make books like this possible.

There are too many people that have helped me technically with this book and I would like to mention a few. I would like to thank Michal Malohlava at http://h2o.ai/ for helping me with H2O, and Arsalan Tavakoli-Shiraji at https://databricks.com/ for answering my many questions. I would also like to thank Kenny Bastani for allowing me to use his Mazerunner product.

Riddhi Tuljapurkar, at Packt, and the book reviewers have put in a sterling effort to help push this book along. Finally, I would like to thank my family who have allowed me the time develop this book through the months of 2015.

About the Reviewers

Andrea Mostosi is a technology enthusiast. Innovation lover since when he was a child, he started his professional job in the early 2000s, and has worked on several projects playing almost every role in the computer science environment. He is currently the CTO at The Fool, a company that tries to make sense of data. During his free time, he likes travelling, running, cooking, biking, reading, observing the sky, and coding.

I would like to thank my wonderful girlfriend Khadija, who lovingly supports me in everything I do. I would also thank my geek friends: Simone M, Daniele V, Luca T, Luigi P, Michele N, Luca O, Luca B, Diego C, and Fabio B. They are the smartest people I know, and comparing myself with them has always pushed me to be better.

Toni Verbeiren received his PhD in theoretical physics in 2003. He has worked on models of artificial neural networks, entailing mathematics, statistics, simulations, (lots of) data, and numerical computations. Since then, he has been active in this industry in a range of domains and roles: infrastructure management and deployment, service and IT management, and ICT/business alignment and enterprise architecture. Around 2010, he started picking up his earlier passion, which is now called Data Science. The combination of data and common sense can be a very powerful basis for making decisions and analyzing risk.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark»

Look at similar books to Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Shrey Mehrotra

Apache Spark Quick Start Guide: Quickly learn the art of writing efficient big data applications with Apache Spark

Anirudh Kala

Optimizing Databricks Workloads: Harness the power of Apache Spark in Azure and maximize the performance of modern big data workloads

Ed Elliott

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

Hien Luu

Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

Vishwanathan Narayanan

Big Data Hadoop Interview Guide: Get answers to the most frequently asked questions in a Hadoop interview (English Edition)

Thottuvaikkatumana

Apache Spark 2 for beginners develop large-scale distributed data processing applications using Spark 2 in Scala and Python

Romeo Kienzler

Apache Spark 2: Data Processing and Real-Time Analytics: Master complex big data processing, stream analytics, and machine learning with Apache Spark

Robert Ilijason

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

Javier Luraschi

Luraschi, J: Mastering Spark with R

Hien Luu

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Gerard Maas

Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming

Rajanarayanan Thottuvaikkatumana

Apache Spark 2 for Beginners

Reviews about «Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark»

Discussion, reviews of the book Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.