LitArk » Books » Politics

Tobias Macey - 97 Things Every Data Engineer Should Know

Here you can read online Tobias Macey - 97 Things Every Data Engineer Should Know full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2021, publisher: OReilly Media, Inc., genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
97 Things Every Data Engineer Should Know
Author:
Tobias Macey
Publisher:
OReilly Media, Inc.
Genre:
Books / Politics
Year:
2021
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

97 Things Every Data Engineer Should Know: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "97 Things Every Data Engineer Should Know" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

With this in-depth book, data engineers will learn powerful, real-world best practices for managing data big and small. Contributors from companies including Google, Microsoft, IBM, Facebook, Databricks, and GitHub share their experiences and lessons learned for cleaning, prepping, wrangling, storing, processing, and ingesting data. Current and aspiring data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will get targeted advice for overcoming a variety of specific challenges from engineers at major companies. Projects include: Building pipelines Stream processing Data privacy and security Data governance and lineage Data storage and architecture Ecosystem of modern tools Data team makeup and culture Career advice

Tobias Macey: author's other books

Who wrote 97 Things Every Data Engineer Should Know? Find out the surname, the name of the author of the book and a list of all author's works by series.

97 Things Every Data Engineer Should Know — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "97 Things Every Data Engineer Should Know" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

97 Things Every Data Engineer Should Know

by Tobias Macey

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles ( http://oreilly.com ). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

Editors: Jessica Haberman and Jill Leonard
Production Editor: Beth Kelly
Copyeditor: FILL IN COPYEDITOR
Proofreader: FILL IN PROOFREADER
Indexer: FILL IN INDEXER
Interior Designer: Monica Kamsvaag
Cover Designer: FILL IN COVER DESIGNER
Illustrator: Kate Dullea

July 2021: First Edition

Revision History for the First Edition

YYYY-MM-DD: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781492062417 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. 97 Things Every Data Engineer Should Know, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

The views expressed in this work are those of the author(s), and do not represent the publishers views. While the publisher and the author(s) have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author(s) disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-492-06241-7

[FILL IN]

Preface

Data engineering as a distinct role is relatively new, but the responsibilities have existed for decades. Broadly speaking, a data engineer makes data available for use in analytics, machine learning, business intelligence, etc. The introduction of big data technologies, data science, distributed computing, and the cloud have all contributed to making the work of the data engineer more necessary, more complex, and (paradoxically) more possible. It is an impossible task to write a single book that encompasses everything that you will need to know to be effective as a data engineer, but there are still a number of core principles that will help you in your journey.

This book is a collection of advice from a wide range of individuals who have learned valuable lessons about working with data the hard way. To save you the work of making their same mistakes, we have collected their advice to give you a set of building blocks that can be used to lay your own foundation for a successful career in data engineering.

In these pages you will find career tips for working in data teams, engineering advice for how to think about your tools, and fundamental principles of distributed systems. There are many paths into data engineering, and no two people will use the same set of tools, but we hope that you will find the inspiration that will guide you on your journey. So regardless of whether this is your first step on the road, or you have been walking it for years we wish you the best of luck in your adventures.

OReilly Online Learning

For more than 40 years, OReilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. OReillys online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from OReilly and 200+ other publishers. For more information, visit http://oreilly.com.

How to Contact Us

Please address comments and questions about this book to the publisher:

OReilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/97-things-data-eng.

Email to comment or ask technical questions about this book.

Visit http://oreilly.com for news and information about our books and courses.

Find us on Facebook: http://facebook.com/oreilly

Watch us on YouTube: http://youtube.com/oreillymedia

Acknowledgments

I would like to thank my wife for her help and support while I worked on this book, the numerous contributors for sharing their time and expertise, and the OReilly team for all of their hard work to make this book a reality.

Chapter 1. Three Distributed Programming Concepts to Be Aware of When Choosing an Open Source Framework

Adi Polak

Many data engineers create pipelines for extract transform and load ETL or - photo 3

Many data engineers create pipelines for extract, transform, and load (ETL) or extract, load, and transform (ELT) operations. During a transform (T) task, you might be working with data that fits in one machines memory. However, often the data will require you to use frameworks/solutions that leverage distributed parallel computation to achieve the desired goal. To support that, many researchers have developed models of distributed programming and computation embodied in known frameworks such as Apache Spark, Apache Cassandra, Apache Kafka, TensorFlow, and many more. Lets look at the three most used distributed programming models for data analytics and distributed machine learning.

MapReduce Algorithm

MapReduce is a distributed computation algorithm developed by Google in 2004. As developers, we specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. This approach is an extension of the split-apply-combine strategy for data analysis.

In practice, every task is split into multiple map and reduce functions. Data is distributed over multiple nodes/machines, and each chunk of data is processed on a node. A logic function is applied to the data on that node, and later the reduce operation combines data via the shuffle mechanism. In this process, the nodes redistribute the data based on the map function keys output.

Later we can apply more logic to the combined data or go for another round of split-apply-combine if necessary. The open source solutions implementing these concepts are Apache Spark, Hadoop MapReduce, Apache Flink, and more.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «97 Things Every Data Engineer Should Know»

Look at similar books to 97 Things Every Data Engineer Should Know. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Piethein Strengholt

Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric, Second Edition (Final Release)

Hubert Dulay and Stephen Mooney

Streaming Data Mesh (8th Early Release)

Ole Olesen-Bagneux

The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation

Laura La Bella

Becoming a Data Engineer

Rukmani Gopalan

The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release

Joe Reis

Fundamentals of Data Engineering: Plan and Build Robust Data Systems

Joe Reis

Fundamentals of Data Engineering

Adi Wijaya

Data Engineering with Google Cloud Platform: A practical guide to operationalizing scalable data analytics systems on GCP

Gareth Eagar

Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS

Peter Ghavami

Big Data Management: Data Governance Principles for Big Data Analytics

Ben Herzberg

Snowflake Security: Securing Your Snowflake Data Cloud

Greg Foss

Practical Data Science with SAP: Machine Learning Techniques for Enterprise Data

Reviews about «97 Things Every Data Engineer Should Know»

Discussion, reviews of the book 97 Things Every Data Engineer Should Know and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.