• Complain

Hien Luu - Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

Here you can read online Hien Luu - Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2021, publisher: Apress, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Hien Luu Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library
  • Book:
    Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library
  • Author:
  • Publisher:
    Apress
  • Genre:
  • Year:
    2021
  • Rating:
    5 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 100
    • 1
    • 2
    • 3
    • 4
    • 5

Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in batch and streaming; and the scalable machine learning algorithms and practical utilities to build machine learning applications.

Beginning Apache Spark 3 begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack. Next, it offers an overview of Spark SQL before moving on to its advanced features. It covers tips and techniques for dealing with performance issues, followed by an overview of the structured streaming processing engine. It concludes with a demonstration of how to develop machine learning applications using Spark MLlib and how to manage the machine learning development lifecycle. This book is packed with practical examples and code snippets to help you master concepts and features immediately after they are covered in each section.

After reading this book, you will have the knowledge required to build your own big data pipelines, applications, and machine learning applications.

What You Will Learn

  • Master the Spark unified data analytics engine and its various components
  • Work in tandem to provide a scalable, fault tolerant and performant data processing engine
  • Leverage the user-friendly and flexible programming model to perform simple to complex data analytics using dataframe and Spark SQL
  • Develop machine learning applications using Spark MLlib
  • Manage the machine learning development lifecycle using MLflow

Who This Book Is For

Data scientists, data engineers and software developers.

Hien Luu: author's other books


Who wrote Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library? Find out the surname, the name of the author of the book and a list of all author's works by series.

Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Contents
Landmarks
Book cover of Beginning Apache Spark 3 Hien Luu Beginning Apache Spark 3 - photo 1
Book cover of Beginning Apache Spark 3
Hien Luu
Beginning Apache Spark 3
With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library
2nd ed.
Logo of the publisher Hien Luu SAN JOSE CA USA ISBN 978-1-4842-7382-1 - photo 2
Logo of the publisher
Hien Luu
SAN JOSE, CA, USA
ISBN 978-1-4842-7382-1 e-ISBN 978-1-4842-7383-8
https://doi.org/10.1007/978-1-4842-7383-8
Hien Luu 2021
Apress Standard
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress Media, LLC part of Springer Nature.

The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

I dedicate this book to my wife, Jessica, and my three boysKevin, Steven, and Eric.

Introduction

According to Andrew Ng, AI is the new electricitypowered by big data. It is evident the intersection between big data and AI will grow bigger and stronger as time goes on. Apache Spark was born before the AI revolution. However, it has evolved into an invaluable piece of big data technology to help companies around the world to transform their business with big data and machine learning.

Apache Spark version 3.0 was released in 2020, the same year as Sparks tenth anniversary. Release 3.0 includes many improvements and advancements across the Spark stack. Some of the notable features include 2x performance improvement with adaptive query execution, significant performance improvement and ease of use in panda APIs, and new UI for structured streaming to gain insights into the streaming queries and debug performance-related issues.

There is no better time to learn and gain Apache Spark skills.

Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com/9781484273821. For more detailed information, please visit http://www.apress.com/source-code.

Acknowledgments

Writing and completing this book was a team effort that involved many people, and each person played a specific role in helping push this project over the finish line.

First and foremost, I would like to thank my wife for supporting me and giving me space and time to write this book and being OK with skipping some of the house chores.

Second, I would like to thank the technical reviewers, Pramod and Akshay. Their diligence and feedback made this book more useful for readers.

Finally, I would like to thank the ace coordinating editor, Divya Modi, for nudging me and keeping me honest in completing each chapter by the deadline I promised.

Table of Contents
About the Author
Hien Luu
has extensive experience in designing and building big data applications and - photo 3
has extensive experience in designing and building big data applications and machine learning infrastructure. He is particularly passionate about the intersection between big data and machine learning. Hien enjoys working with open source software and has contributed to Apache Pig and Azkaban. Teaching is also one of his passions, and he serves as an instructor at the UCSC Silicon Valley Extension school, teaching Apache Spark. He has given presentations at various conferences such as Data+AI Summit, MLOps World, QCon SF, QCon London, Hadoop Summit, and JavaOne.
About the Technical Reviewers
Pramod Singh
is a data science manager at Bain Company He previously served as a senior - photo 4
is a data science manager at Bain & Company. He previously served as a senior machine learning engineer at Walmart Labs and a data science manager at Publicis Sapient in India. He has spent over 11 years in machine learning, deep learning, data engineering, algorithm design, and application development. Pramod has authored four books: Machine Learning with PySpark (Apress, 2018), Learn PySpark (Apress, 2019), Learn TensorFlow 2.0 (Apress, 2020), and Deploy Machine Learning Models to Production (Apress, 2020). He is also a regular speaker at major conferences such as OReillys Strata Data, GIDS, and other AI conferences. Pramod is an active mentor in the data science community and at various educational institutes. He lives in Gurgaon, India, with his wife and five-year-old son. In his spare time, he enjoys playing guitar, coding, reading, and watching football.
Akshay R. Kulkarni
is a renowned AI and machine learning evangelist and thought leader He has - photo 5
is a renowned AI and machine learning evangelist and thought leader. He has consulted with several Fortune 500 and global enterprises in driving AI and data scienceled strategic transformation. Akshay has rich experience in building and scaling AI and machine learning businesses and creating a significant impact. He is currently a manager for Publicis Sapients core data science and AI team, where he is part of strategy and transformation interventions through AI. He manages high-priority growth initiatives around data science and works on various machine learning and AI engagements by applying state-of-the-art techniques.

Akshay is a Google Developers Expert in machine learning, a published author, and a regular speaker at major AI and data science conferences, including Strata, OReilly AI Conf, GIDS. He is also a visiting faculty for some of the top graduate institutes in India.

In 2019, he was featured as one of Indias top 40 under 40 Data Scientists.

In his spare time, Akshay enjoys reading, writing, coding, and helping aspiring data scientists. He lives in Bangalore, India, with his family.

The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
H. Luu Beginning Apache Spark 3 https://doi.org/10.1007/978-1-4842-7383-8_1
1. Introduction to Apache Spark
Hien Luu
(1)
SAN JOSE, CA, USA

There is no better time to learn Apache Spark than now. It has become one of the critical components in the big data stack due to its ease of use, speed, and flexibility. Over the years, it has established itself as the unified engine for multiple workload types, such as big data processing, data analytics, data science, and machine learning. Companies in many industries widely adopt this scalable data processing system, including Facebook, Microsoft, Netflix, and LinkedIn. Moreover, it has steadily improved through each major release.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library»

Look at similar books to Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library»

Discussion, reviews of the book Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.