• Complain

Mark Kromer - Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud

Here you can read online Mark Kromer - Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. publisher: Apress, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Mark Kromer Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud
  • Book:
    Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud
  • Author:
  • Publisher:
    Apress
  • Genre:
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Mark Kromer: author's other books


Who wrote Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud? Find out the surname, the name of the author of the book and a list of all author's works by series.

Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Contents
Landmarks
Book cover of Mapping Data Flows in Azure Data Factory Mark Kromer - photo 1
Book cover of Mapping Data Flows in Azure Data Factory
Mark Kromer
Mapping Data Flows in Azure Data Factory
Building Scalable ETL Projects in the Microsoft Cloud
The Apress Logo Mark Kromer SNOHOMISH WA USA ISBN 978-1-4842-8611-1 - photo 2

The Apress Logo.

Mark Kromer
SNOHOMISH, WA, USA
ISBN 978-1-4842-8611-1 e-ISBN 978-1-4842-8612-8
https://doi.org/10.1007/978-1-4842-8612-8
Mark Kromer 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress Media, LLC, part of Springer Nature.

The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

This book is dedicated to my loving wife Stacy and our boys Ethan and Jude. Thank you for putting up with my late hours working on data analytics and writing this book!

Introduction

The ETL (extract, transform, load) process has been a cornerstone of data warehouses, data marts, and business intelligence for decades. ETL is how data engineers have traditionally refined raw data into business analytics that guide the business to make better decisions. These projects have allowed engineers to build up libraries of common ETL processes and practices from traditional on-premises data warehouses over the years, very commonly with data coming from Oracle, Microsoft, IBM, or Sybase databases or business ERP/CRM applications like Salesforce, SAP, Dynamics, etc. However, over the past decade, our industry has seen these analytical workloads migrate to the cloud at a very rapid pace.

To keep up with these changes, weve had to adjust ETL techniques to account for more varied and larger data. The big data revolution and cloud migrations have forced us to rethink many of our proven ETL patterns to meet modern data transformation challenges and demands. Today, the vast majority of data that we process exists primarily in the cloud. And that data may not always be governed and curated by rigid business processes in the way that our previous ETL processes could rely on.

The common scenarios of processing well-known hardened schemas from SAP and CSV exports will now have a new look and challenge. The data sources will likely vary in shape, size, and scope from day to day. We need to account for schema drift, data drift, and other possible obstructions to refining data in a way that turns the data into refined business analytics.

Cloud-First ETL with Mapping Data Flows

Welcome to Mapping Data Flows in Azure Data Factory! In this book, Im going to introduce you to Microsoft Azure Data Factory and the Mapping Data Flows feature in ADF as the key ETL toolset to tackle these modern data analytics challenges. As you make your way through the book, youll learn key concepts, and through the use of examples, youll begin to build your first cloud-based ETL projects that can help you to unlock the potential of scaled-out big data ETL processing in the cloud. Ill demonstrate how to tackle the particularly difficult and challenging aspects of big data analytics and how to prepare data for business decision makers in the cloud.

To get the most value from this book, you should have a firm understanding of building data warehouses and business intelligence projects. It is not necessary to have many hours of experience building cloud-first big data analytics projects already. However, having some experience in cloud computing will provide valuable context that will help you as you work through some of these new approaches.

The examples and scenarios used in this book will be patterns and practices that are based on ETL common scenarios, so having data engineering experience and background will also be very helpful. Ill help guide you along as you migrate from traditional on-premises data engineering to the world of Azure Data Factory.

Overview of Azure Data Factory

To become familiar with the data engineering process in Microsoft Azure, well need to begin with an overview of Azure Data Factory (ADF), which is the Azure service for building data pipelines. The first chapter will focus on conceptual discussions of how to build a process to transform massive of amounts of data with many quality issues in the cloud. Essentially, we need to redefine ETL for cloud-based big data, where data volumes and veracity can change daily, and well compare and contrast the Azure mechanism for the modern data engineer with traditional ETL. Thats where well begin the process of building ETL pipelines that will serve as the basis for your big data analytics projects. Im going to present a series of common use cases that will demonstrate how to apply the concepts discussed in the earlier chapters to practical ETL projects. From there, the focus will turn to a deep dive on Mapping Data Flows and how to build ETL frameworks in ADF by using the visual design-time interface to build code-free data flows. Mapping Data Flows is primarily a code-free visual design experience, so well walk through techniques and best practices for managing the software development life cycle of a data flow in ADF. Data Factory provides many different means to process and transform data that include coding and calling external compute processes. However, in this book, the focus will be on building ETL pipelines in a code-free style in Mapping Data Flows.

As you work your way through the early chapters in this book, you should begin to develop an understanding of how to apply data engineering principles in ADF and Mapping Data Flows. Thats where well begin to implement mechanisms to help organize your work and design-time environment, preparing for eventual operationalization at runtime. Well set up a Git repo for our work, as you should in real-life scenarios. Well design interactive data transformation graphs using serverless compute that can scale out as needed. You wont need to manage physical servers and clusters with ADF, but I will explain how things work behind the scenes to provide this serverless compute power for your pipelines. Behind the scenes, ADF will leverage the Azure platform-as-a-service workflow engine Logic Apps for pipeline execution and scheduling. The transformation engine for Mapping Data Flows is Apache Spark. But you wont have to learn anything about those underlying dependent services. The Azure Integration Runtimes will provide that compute for you dynamically in a serverless manner.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud»

Look at similar books to Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud»

Discussion, reviews of the book Mapping Data Flows in Azure Data Factory : Building Scalable ETL Projects in the Microsoft Cloud and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.