LitArk » Books » Home and family

Robert Ilijason - Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

Here you can read online Robert Ilijason - Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2020, publisher: Apress, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud
Author:
Robert Ilijason
Publisher:
Apress
Genre:
Books / Home and family
Year:
2020
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.

This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything about configuring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data.

This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned.

What You Will Learn

Discover the value of big data analytics that leverage the power of the cloud
Get started with Databricks using SQL and Python in either Microsoft Azure or AWS
Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture
See how these tools are used in the real world
Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free

Who This Book Is For

Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

Robert Ilijason: author's other books

Who wrote Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud? Find out the surname, the name of the author of the book and a list of all author's works by series.

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Contents

Landmarks

Robert Ilijason

Beginning Apache Spark Using Azure Databricks

Unleashing Large Cluster Analytics in the Cloud

Robert Ilijason

Viken, Sweden

Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com/9781484257807 . For more detailed information, please visit http://www.apress.com/source-code .

ISBN 978-1-4842-5780-7 e-ISBN 978-1-4842-5781-4

https://doi.org/10.1007/978-1-4842-5781-4

Robert Ilijason 2020

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

Select support from family;

SUPPORT

----------

Malin

Max

Mom

Introduction

So you wanna be a data analyst, data scientist, or data engineer? Good choice. The world needs more of us. Also, it is not only fun and rewarding work but also easy! At least if youre willing to put in the effort.

The bar for getting into heavy-duty data analytics has never been lower. You dont need servers, advanced Linux skills, or a ton of money to get started anymore. Graphical tools like Tableau and Power BI made small-scale data work available for the masses. Now Databricks does the same for huge datasets. Millions, billions, and trillions of rows are not a problem.

This book will ease you into both the tool and the field. Well start by looking at the data analytics field in general why its hot, what has changed, and where Apache Spark and Databricks fit into the overall picture.

Then its time for you to learn about how Databricks works. Well spend a few chapters on learning how it works internally and how you actually use it everything from getting around the user interface to spinning up clusters and importing data.

Once you know how to use the tool, its time to start coding. Youll familiarize yourself with (Structured Query Language) SQL and Python, the two main languages for data analysis work. It doesnt stop there, well follow it up by digging deeper into advanced data wrangling techniques where well see a lot of the ifs and buts youll come across when you work with data in reality.

Finally, Ill drag you through a few short chapters with more advanced topics. Coming out of them, youll have an understanding of how to run machine learning algorithms, manage delta loads, and run Databricks through the application programming interface (API).

And with that, youll be ready to get started for real, solving small and large problems in the real world. This is an introductory book, but once youre through it, youll have the tools you need to start exploring huge datasets around you or in your business.

Looking forward to seeing you around in the world of data experts.

Table of Contents

About the Author

Robert Ilijason

is a 20-year veteran in the business intelligence segment He has worked as a - photo 3

is a 20-year veteran in the business intelligence segment. He has worked as a contractor for some of Europes biggest companies and has conducted large-scale analytics projects within the areas of retail, telecom, banking, government, and more. He has seen his share of analytic trends come and go over the years, but unlike most of them, he strongly believes that Apache Spark in the cloud, especially with Databricks, is a game changer.

About the Technical Reviewer

Michela Fumagalli

graduated from the Polytechnic University of Milan with an MSc in Electronics and TLC Engineering. She also got a masters degree in Big Data and Analytics, along with a Databricks Apache Spark certification.

She has studied and developed several machine learning models to help put data in the heart of businesses for data-driven decisions. Her other interests lie in the fields of reinforcement learning and deep reinforcement learning. After having gained experiences in various international companies, she now works for IKEA.

Robert Ilijason 2020

R. Ilijason Beginning Apache Spark Using Azure Databricks https://doi.org/10.1007/978-1-4842-5781-4_1

1. Introduction to Large-Scale Data Analytics

Robert Ilijason

(1)

Viken, Sweden

Lets start at the very top. This book is about large-scale data analytics. Itll teach you to take a dataset, load it into a database, scrub the data if necessary, analyze it, run algorithms on it, and finally present the discoveries you make.

Well use a fairly new tool on the market called Databricks for all of this, simply because its the most ambitious tool on the market right now. While there are other tools that can provide you with the same capabilities, more or less, no one does it like Databricks.

Databricks hands you the massive analysis power of Apache Spark in a very easy-to-use way. Ease of use is important. You should be spending time looking at data, not figuring out the nitty-gritty of configuration files, virtual machines, and network setups. Its far too easy to get stuck in tech. With Databricks, you wont.

But before we start, lets spend a few pages talking about what this data analysis thing is all about, whats happened the last few years, and what makes large-scale analytics different from the stuff you might have run in Excel or SQL Server Analysis Services.

Analytics, the hype

If you havent lived under a rock the last few years, youve seen advanced analytics being referenced pretty much everywhere. Smart algorithms are winning elections, driving cars, and about to get us to Mars. Theyll soon solve all our problems, possibly including singularity. The future is here! At least if you are to believe the press.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud»

Look at similar books to Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Ron LEsteve

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

Anirudh Kala

Optimizing Databricks Workloads: Harness the power of Apache Spark in Azure and maximize the performance of modern big data workloads

Ahmad Osama

Azure Data Engineering Cookbook: Design and implement batch and streaming analytics using Azure Cloud Services

Lai Rudy

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

Tejada

Mastering Azure Analytics architecting in the cloud with Azure Data Lake, HDInsight, and Spark

Frampton Mike

Mastering Apache Spark: gain expertise in processing and storing data by using advanced techniques with Apache Spark

Romeo Kienzler

Apache Spark 2: Data Processing and Real-Time Analytics: Master complex big data processing, stream analytics, and machine learning with Apache Spark

Tomasz Drabas

PySpark Cookbook: Over 60 Recipes for Implementing Big Data Processing and Analytics Using Apache Spark and Python

Hien Luu

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Gerard Maas

Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming

Venkat Ankam

Big Data Analytics

Rishi Yadav

Spark Cookbook

Reviews about «Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud»

Discussion, reviews of the book Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.