LitArk » Books » Computer

Ed Elliott - Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

Here you can read online Ed Elliott - Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2021, publisher: Apress, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets
Author:
Ed Elliott
Publisher:
Apress
Genre:
Books / Computer
Year:
2021
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Intermediate-Advanced user level

Ed Elliott: author's other books

Who wrote Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets? Find out the surname, the name of the author of the book and a list of all author's works by series.

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Contents

Landmarks

Book cover of Introducing .NET for Apache Spark

Ed Elliott

Introducing .NET for Apache Spark

Distributed Processing for Massive Datasets

1st ed.

Logo of the publisher

Ed Elliott

Sussex, UK

Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com/9781484269916 . For more detailed information, please visit http://www.apress.com/source-code .

ISBN 978-1-4842-6991-6 e-ISBN 978-1-4842-6992-3

https://doi.org/10.1007/978-1-4842-6992-3

Ed Elliott 2021

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Distributed to the book trade worldwide by Springer Science+Business Media LLC, 1 New York Plaza, Suite 4600, New York, NY 10004. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Sarah, Sammy, and Lucy.

Introduction

From the first time I ever used Apache Spark, I was captivated by how powerful it was and what it could enable. Having come from a traditional Microsoft developer background, C, C++, C#, F#, Microsoft SQL Server, SSIS, and then onto Azure, Azure Data Factory, and SQL Data Warehouse, to be able to use Apache Spark to process large files quickly was fascinating. The less fascinating part was having to decide whether to learn Scala or Python to work with Apache Spark. I studied both languages but missed developing in C# and F#, so when at the launch of the .NET for Apache Spark project, I was particularly excited. I have been closely following the project and have submitted several successful pull requests, which are now included in the bindings.

This book intends to introduce both Apache Spark and the .NET for Apache Spark bindings. If you are new to Apache Spark but have C# or F# experience, or whether you already know Apache Spark but will use the .NET for Apache Spark bindings, then you will be able to use this book to get you started.

In this book, we will start with a step-by-step guide on getting an instance of Apache Spark running on your developer machine, whether on Windows, Linux, or macOS, including a discussion on the crucial points such as the Java version. We will cover in detail how the .NET bindings work. We will then get you up and running with your first .NET for Apache Spark application before explaining the different APIs and taking a look at some example programs.

We will walk through an example batch mode, streaming, and machine learning application that you can follow along. We will also look at how to troubleshoot Apache Spark and, finally, look at how you can make your changes to the .NET for Apache Spark project, including submitting any pull requests back to the core product, if you would like to. Finally, we will look at the support for the delta format, which brings ACID properties to files in a data lake. Exciting times!

Table of Contents

Part I: Getting Started

Part II: The APIs

Part III: Examples

About the Author

Ed Elliott

is a data engineer who has been working in IT for 20 years and has focused on - photo 3

is a data engineer who has been working in IT for 20 years and has focused on data for the last 15 years. He uses Apache Spark at work and has been contributing to the Microsoft .NET for Apache Spark open source project since it was released in 2019. Ed has been blogging and writing since 2014 at his own blog as well as for SQL Server Central and Redgate. He has spoken at a number of events such as SQLBits, SQL Saturday, and the GroupBy conference.

About the Technical Reviewer

Gerald Versluis

is a software engineer at Microsoft from the Netherlands . With years of experience working with Azure, ASP.NET , Xamarin, and lots of other .NET technologies, he has been involved in numerous projects big or small.

Not only does he like to code, but he is also passionate about spreading his knowledge as well as gaining some in the bargain. Gerald involves himself in speaking, providing training sessions, writing blogs ( https://blog.verslu.is ) or articles, recording videos on YouTube ( https://youtube.com/GeraldVersluis ), and contributing to open source projects in his spare time. Twitter: @jfversluis | All handles: https://jfversluis.dev

Part I Getting Started

Ed Elliott 2021

E. Elliott Introducing .NET for Apache Spark https://doi.org/10.1007/978-1-4842-6992-3_1

1. Understanding Apache Spark

Ed Elliott

(1)

Sussex, UK

Apache Spark is a data analytics platform that has made big data accessible and brings large-scale data processing into the reach of every developer. With Apache Spark, it is as easy to read from a single CSV file on your local machine as it is to read from a million CSV files in a data lake.

An Example

Let us look at an example . The code in Listings (the F# version) reads from a set of CSV files and counts how many records match a specific condition. The code reads all CSV files in a specific path, so the number of files we read from is practically limitless.

Although the examples in this chapter are fully functioning samples, they require a working Apache Spark instance, either locally or on a cluster. We cover setting up Apache Spark in Chapter .

using System;

using System.Linq;

using Microsoft.Spark.Sql;

using static Microsoft.Spark.Sql.Functions;

namespace Introduction_CSharp

{

class Program

{

static void Main(string[] args)

{

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets»

Look at similar books to Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Shrey Mehrotra

Apache Spark Quick Start Guide: Quickly learn the art of writing efficient big data applications with Apache Spark

Hien Luu

Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

Lai Rudy

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

Thottuvaikkatumana

Apache Spark 2 for beginners develop large-scale distributed data processing applications using Spark 2 in Scala and Python

Holden Karau

Learning Spark

Frampton Mike

Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark

Chambers William Andrew

Spark: the definitive guide: big data processing made simple

Romeo Kienzler

Apache Spark 2: Data Processing and Real-Time Analytics: Master complex big data processing, stream analytics, and machine learning with Apache Spark

Hien Luu

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Gerard Maas

Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming

Rishi Yadav

Spark Cookbook

Krishna Sankar

Fast Data Processing with Spark

Reviews about «Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets»

Discussion, reviews of the book Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.