• Complain

Vishal Pathak - Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features

Here you can read online Vishal Pathak - Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2022, publisher: Packt Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Vishal Pathak Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features

Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Build efficient data lakes that can scale to virtually unlimited size using AWS Glue

Key Features
  • Learn to work with AWS Glue to overcome typical implementation challenges in data lakes
  • Create and manage serverless ETL pipelines that can scale to manage big data
  • Written by AWS Glue community members, this practical guide shows you how to implement AWS Glue in no time
Book Description

Organizations these days have gravitated toward services such as AWS Glue that undertake undifferentiated heavy lifting and provide serverless Spark, enabling you to create and manage data lakes in a serverless fashion. This guide shows you how AWS Glue can be used to solve real-world problems along with helping you learn about data processing, data integration, and building data lakes.

Beginning with AWS Glue basics, this book teaches you how to perform various aspects of data analysis such as ad hoc queries, data visualization, and real-time analysis using this service. It also provides a walk-through of CI/CD for AWS Glue and how to shift left on quality using automated regression tests. Youll find out how data security aspects such as access control, encryption, auditing, and networking are implemented, as well as getting to grips with useful techniques such as picking the right file format, compression, partitioning, and bucketing. As you advance, youll discover AWS Glue features such as crawlers, Lake Formation, governed tables, lineage, DataBrew, Glue Studio, and custom connectors. The concluding chapters help you to understand various performance tuning, troubleshooting, and monitoring options.

By the end of this AWS book, youll be able to create, manage, troubleshoot, and deploy ETL pipelines using AWS Glue.

What you will learn
  • Apply various AWS Glue features to manage and create data lakes
  • Use Glue DataBrew and Glue Studio for data preparation
  • Optimize data layout in cloud storage to accelerate analytics workloads
  • Manage metadata including database, table, and schema definitions
  • Secure your data during access control, encryption, auditing, and networking
  • Monitor AWS Glue jobs to detect delays and loss of data
  • Integrate Spark ML and SageMaker with AWS Glue to create machine learning models
Who this book is for

This book is for ETL developers, data engineers, and data analysts who want to understand how AWS Glue can help you solve your business problems. Basic knowledge of AWS data services is assumed.

Table of Contents
  1. Data Management Introduction and Concepts
  2. Introduction to Important AWS Glue Features
  3. Data Ingestion
  4. Data Preparation
  5. Designing Data Layouts
  6. Data Management
  7. Metadata Management
  8. Data Security
  9. Data Sharing
  10. Data Pipeline Management
  11. Monitoring
  12. Tuning, Debugging, and Troubleshooting
  13. Data Analysis
  14. Machine Learning Integration
  15. Architecting Data Lakes for Real-World Scenarios and Edge Cases

Vishal Pathak: author's other books


Who wrote Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features? Find out the surname, the name of the author of the book and a list of all author's works by series.

Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Serverless ETL and Analytics with AWS Glue Your comprehensive reference guide - photo 1
Serverless ETL and Analytics with AWS Glue

Your comprehensive reference guide to learning about AWS Glue and its features

Vishal Pathak

Subramanya Vajiraya

Noritaka Sekiyama

Tomohiro Tanaka

Albert Quiroga

Ishan Gaur

BIRMINGHAMMUMBAI Serverless ETL and Analytics with AWS Glue Copyright 2022 - photo 2

BIRMINGHAMMUMBAI

Serverless ETL and Analytics with AWS Glue

Copyright 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Reshma Raman

Senior Editor: Tazeen Shaikh

Content Development Editor: Sean Lobo

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Jyoti Chauhan

Marketing Coordinator: Nivedita Singh

First published: August 2022

Production reference: 1220722

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80056-498-5

www.packt.com

Contributors
About the authors

Vishal Pathak is a Data Lab Solutions Architect at AWS. Vishal works with customers on their use cases, architects solutions to solve their business problems, and helps them build scalable prototypes. Prior to his journey in AWS, Vishal helped customers implement business intelligence, data warehouse, and data lake projects in the US and Australia.

Subramanya Vajiraya is a Big data Cloud Engineer at AWS Sydney specializing in AWS Glue. He obtained his Bachelor of Engineering degree specializing in Information Science & Engineering from NMAM Institute of Technology, Nitte, KA, India (Visvesvaraya Technological University, Belgaum) in 2015 and obtained his Master of Information Technology degree specialized in Internetworking from the University of New South Wales, Sydney, Australia in 2017. He is passionate about helping customers solve challenging technical issues related to their ETL workload and implementing scalable data integration and analytics pipelines on AWS.

Noritaka Sekiyama is a Senior Big Data Architect on the AWS Glue and AWS Lake Formation team. He has 11 years of experience working in the software industry. Based in Tokyo, Japan, he is responsible for implementing software artifacts, building libraries, troubleshooting complex issues and helping guide customer architectures.

Tomohiro Tanaka is a senior cloud support engineer at AWS. He works to help customers solve their issues and build data lakes across AWS Glue, AWS IoT, and big data technologies such Apache Spark, Hadoop, and Iceberg.

Albert Quiroga works as a senior solutions architect at Amazon, where he is helping to design and architect one of the largest data lakes in the world. Prior to that, he spent four years working at AWS, where he specialized in big data technologies such as EMR and Athena, and where he became an expert on AWS Glue. Albert has worked with several Fortune 500 companies on some of the largest data lakes in the world and has helped to launch and develop features for several AWS services.

Ishan Gaur has more than 13 years of IT experience in software development and data engineering, building distributed systems and highly scalable ETL pipelines using Apache Spark, Scala, and various ETL tools such as Ab Initio and Datastage. He currently works at AWS as a senior big data cloud engineer and is an SME of AWS Glue. He is responsible for helping customers to build out large, scalable distributed systems and implement them in AWS cloud environments using various big data services, including EMR, Glue, and Athena, as well as other technologies, such as Apache Spark, Hadoop, and Hive.

About the reviewers

Akira Ajisaka is an open source developer who has over 10 years of engineering experience in big data. He contributes to the open source community and is an Apache Software Foundation member and Apache Hadoop PMC member. He has worked for the AWS Glue ETL team since 2022 and is learning a lot about Apache Spark.

Keerthi Chadalavada is a senior software engineer with AWS Glue. She is passionate about building cloud-based, data-intensive applications at scale. Her recent work includes enabling data engineers to build event-driven ETL pipelines that respond in near real time to data events and provide the latest insights to business users. In addition, her work on Glue Blueprints enabled data engineers to build templates for repeatable ETL pipelines and enabled non-data engineers without technical expertise to use these templates to gain faster insights from their data. Keerthi holds a masters degree in computer science from Ohio State University and a bachelors degree in computer science from Bits Pilani, India.

Table of Contents
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features»

Look at similar books to Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features»

Discussion, reviews of the book Serverless ETL and Analytics with AWS Glue: Your comprehensive reference guide to learning about AWS Glue and its features and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.