• Complain

Lai Rudy - Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

Here you can read online Lai Rudy - Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2019, publisher: Packt Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

No cover
  • Book:
    Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
  • Author:
  • Publisher:
    Packt Publishing
  • Genre:
  • Year:
    2019
  • Rating:
    5 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 100
    • 1
    • 2
    • 3
    • 4
    • 5

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs

Key Features

  • Work with large amounts of agile data using distributed datasets and in-memory caching
    • Source data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3
    • Employ the easy-to-use PySpark API to deploy big data Analytics for production

      Book Description

      Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs.

      You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and...

  • Lai Rudy: author's other books


    Who wrote Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs? Find out the surname, the name of the author of the book and a list of all author's works by series.

    Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs — read online for free the complete book (whole text) full work

    Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

    Light

    Font size:

    Reset

    Interval:

    Bookmark:

    Make
    Hands-On Big Data Analytics with PySpark Analyze large datasets and - photo 1
    Hands-On Big Data Analytics with PySpark
    Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
    Rudy Lai
    Bartomiej Potaczek

    BIRMINGHAM - MUMBAI Hands-On Big Data Analytics with PySpark Copyright 2019 - photo 2

    BIRMINGHAM - MUMBAI
    Hands-On Big Data Analytics with PySpark

    Copyright 2019 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors nor Packt Publishing or its dealers and distributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Commissioning Editor: Mrinmayee Kawalkar
    Acquisition Editor: Joshua Nadar
    Content Development Editor: Pr atik Andrade
    Technical Editor: Sneha Hanchate, Jovita Alva, Snehal Dalmet
    Copy Editor: Safis Editing
    Project Coordinator: Namrata Swetta
    Proofreader: Safis Editing
    Indexer: Tejal Daruwale Soni
    Graphics: Jisha Chirayil
    Production Coordinator: Shraddha Falebhai

    First published: March 2019

    Production reference: 1280319

    Published by Packt Publishing Ltd.
    Livery Place
    35 Livery Street
    Birmingham
    B3 2PB, UK.

    ISBN 978-1-83864-413-0

    www.packtpub.com

    maptio Mapt is an online digital library that gives you full access to over - photo 3
    mapt.io

    Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry-leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

    Why subscribe?
    • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

    • Improve your learning with Skill Plans built especially for you

    • Get a free eBook or video every month

    • Mapt is fully searchable

    • Copy and paste, print, and bookmark content

    Packt.com

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

    At www.packt.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

    Contributors
    About the authors

    Colibri Digital i s a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas such as big data, data science, machine learning, and cloud computing. Over the past few years, they have worked with some of the world's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the world's most popular soft drinks companies, helping each of them to better make sense of their data, and process it in more intelligent ways. The company lives by its motto: Data -> Intelligence -> Action.


    Rudy Lai is the founder of QuantCopy, a sales acceleration start-up using AI to write sales emails to prospective customers. Prior to founding QuantCopy, Rudy ran HighDimension.IO, a machine learning consultancy, where he experienced first hand the frustrations of outbound sales and prospecting. Rudy has also spent more than 5 years in quantitative trading at leading investment banks such as Morgan Stanley. This valuable experience allowed him to witness the power of data, but also the pitfalls of automation using data science and machine learning. He holds a computer science degree from Imperial College London, where he was part of the Dean's list, and received awards including the Deutsche Bank Artificial Intelligence prize.


    Bartomiej Potaczek is a software engineer working for Schibsted Tech Polska and programming mostly in JavaScript. He is a big fan of everything related to the react world, functional programming, and data visualization. He founded and created InitLearn, a portal that allows users to learn to program in a pair-programming fashion. He was also involved in InitLearn frontend, which is built on the React-Redux technologies. Besides programming, he enjoys football and crossfit. Currently, he is working on rewriting the frontend for tv.nuSweden's most complete TV guide, with over 200 channels. He has also recently worked on technologies including React, React Router, and Redux.

    Packt is searching for authors like you

    If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

    Preface

    Apache Spark is an open source, parallel processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers.

    This book will help you implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. You will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques to test, immunize, and parallelize Spark jobs.

    This book covers the installation and setup of PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will also help you to work on prototypes on local machines and subsequently go on to handle messy data in production and on a large scale.

    Who this book is for

    This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function, or creating great data platforms for your machine learning models, or looking to use code to magnify the impact of your business, this book is for you.

    Next page
    Light

    Font size:

    Reset

    Interval:

    Bookmark:

    Make

    Similar books «Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs»

    Look at similar books to Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


    Reviews about «Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs»

    Discussion, reviews of the book Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.