LitArk » Books » Home and family

Lai Rudy - Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

Here you can read online Lai Rudy - Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2019, publisher: Packt Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
Author:
Lai Rudy / Potaczek Bartlomiej
Publisher:
Packt Publishing
Genre:
Books / Home and family
Year:
2019
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs

Key Features

Work with large amounts of agile data using distributed datasets and in-memory caching

Source data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3
Employ the easy-to-use PySpark API to deploy big data Analytics for production
Book Description
Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs.
You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and...

Lai Rudy: author's other books

Who wrote Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs? Find out the surname, the name of the author of the book and a list of all author's works by series.

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Hands-On Big Data Analytics with PySpark

Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

Rudy Lai

Bartomiej Potaczek

BIRMINGHAM - MUMBAI

Hands-On Big Data Analytics with PySpark

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors nor Packt Publishing or its dealers and distributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Mrinmayee Kawalkar
Acquisition Editor: Joshua Nadar
Content Development Editor: Pr atik Andrade
Technical Editor: Sneha Hanchate, Jovita Alva, Snehal Dalmet
Copy Editor: Safis Editing
Project Coordinator: Namrata Swetta
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Graphics: Jisha Chirayil
Production Coordinator: Shraddha Falebhai

First published: March 2019

Production reference: 1280319

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-83864-413-0

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry-leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Colibri Digital i s a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas such as big data, data science, machine learning, and cloud computing. Over the past few years, they have worked with some of the world's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the world's most popular soft drinks companies, helping each of them to better make sense of their data, and process it in more intelligent ways. The company lives by its motto: Data -> Intelligence -> Action.

Rudy Lai is the founder of QuantCopy, a sales acceleration start-up using AI to write sales emails to prospective customers. Prior to founding QuantCopy, Rudy ran HighDimension.IO, a machine learning consultancy, where he experienced first hand the frustrations of outbound sales and prospecting. Rudy has also spent more than 5 years in quantitative trading at leading investment banks such as Morgan Stanley. This valuable experience allowed him to witness the power of data, but also the pitfalls of automation using data science and machine learning. He holds a computer science degree from Imperial College London, where he was part of the Dean's list, and received awards including the Deutsche Bank Artificial Intelligence prize.

Bartomiej Potaczek is a software engineer working for Schibsted Tech Polska and programming mostly in JavaScript. He is a big fan of everything related to the react world, functional programming, and data visualization. He founded and created InitLearn, a portal that allows users to learn to program in a pair-programming fashion. He was also involved in InitLearn frontend, which is built on the React-Redux technologies. Besides programming, he enjoys football and crossfit. Currently, he is working on rewriting the frontend for tv.nuSweden's most complete TV guide, with over 200 channels. He has also recently worked on technologies including React, React Router, and Redux.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Preface

Apache Spark is an open source, parallel processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers.

This book will help you implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. You will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques to test, immunize, and parallelize Spark jobs.

This book covers the installation and setup of PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will also help you to work on prototypes on local machines and subsequently go on to handle messy data in production and on a large scale.

Who this book is for

This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function, or creating great data platforms for your machine learning models, or looking to use code to magnify the impact of your business, this book is for you.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs»

Look at similar books to Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Jeffrey Aven

Data Analytics with Spark Using Python

Akash Tandon

Advanced Analytics with PySpark: Patterns for Learning from Data at Scale Using Python and Spark

Anirudh Kala

Optimizing Databricks Workloads: Harness the power of Apache Spark in Azure and maximize the performance of modern big data workloads

Holden Karau

Learning Spark

Drabas

PYSPARK COOKBOOK: over 60 recipes for implementing big data processing and analytics using apache ... spark and python

Jules S. Damji

Learning Spark: Lightning-Fast Data Analytics

Chambers William Andrew

Spark: the definitive guide: big data processing made simple

Romeo Kienzler

Apache Spark 2: Data Processing and Real-Time Analytics: Master complex big data processing, stream analytics, and machine learning with Apache Spark

Robert Ilijason

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

Tomasz Drabas

PySpark Cookbook: Over 60 Recipes for Implementing Big Data Processing and Analytics Using Apache Spark and Python

Hien Luu

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Jenny Kim

Interactive Spark using PySpark

Reviews about «Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs»

Discussion, reviews of the book Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.