Streaming Data Mesh
by Hubert Dulay and Stephen Mooney
Copyright 2023 Hubert Dulay and Stephen Mooney. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
- Acquisitions Editor: Andy Kwan
- Development Editor: Jeff Bleiel
- Production Editor: Beth Kelly
- Copyeditor: Sonia Saruba
- Proofreader:
- Indexer:
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Kate Dullea
Revision History for the First Edition
- 2023-05-11: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098130725 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Streaming Data Mesh, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
This work is part of a collaboration between OReilly and Confluent. See our statement of editorial independence.
978-1-098-13072-5
Preface
Welcome to this first edition of Streaming Data Mesh! Your guide to understanding and building a streaming data mesh that meets all of the pillars of a data mesh.
Data mesh is one of the most popular architecture for data platforms that many are exploring today. This book will help you get a full understanding of this self-servicing data platform in a streaming context. Today, batch processing dominates all ETL processes most businesses. This book will help show a different perspective to data pipelines and apply the same concepts you already understand in batch ETL but in a streaming ETL in the context of a data mesh.
This book is designed to help you understand the essential concepts around streaming data meshthe concepts, architectures, and technologies at its core. The book covers all the essential topics related to streaming mesh, from the basics of data architecture, to the use of big data tools for data warehousing, to business-oriented approaches for streaming data mesh architectures. Additionally, we will look at a stack of services involved in a successful streaming data mesh project.
This book does not require you to have pre-knowledge of the pillars that make up a data mesh. We will briefly introduce the pillars at a very high level but specifically define them with streaming in mind. If you feel you need to understand data mesh in more detail, please refer to Zhamak Dehghanis book Data Mesh: Delivering Data-Driven Value at Scale 1st Edition.
Who Should Read This Book
This book is written for anyone who is interested in learning more about streaming data mesh, combining the exciting work done in data mesh with real-time streaming for data transformation, data product definition, and data governance. This book is also useful for data engineers, data analysts, data scientists, software architects, and product owners who want to implement a streaming data architecture for their projects. This type of book is useful for those who wish to become familiar with streaming data technologies and best practices for integrating them, at scale, into their projects.
Why We Wrote This Book
We wrote a book on streaming data mesh because we believe it has the potential to revolutionize the way companies manage and process their data. Streaming data mesh provides an unified platform that unites messaging, storage, and processing capabilities into one comprehensive solution. By increasing data reliability and coverage while reducing costs, this platform enables companies to significantly accelerate their digital transformation and become data-driven organizations. With this book, we want to make sure our readers understand the key principles, the latest approaches, and the dos and do nots of streaming data mesh. We also want to provide step-by-step guidance for setting up and operating a streaming data mesh, taking into account best practices.
Navigating This Book
This book is organized roughly as follows:
- Chapters provide an introduction to Data Mesh concepts and extend these into a Streaming context.
- goes into detail about domain ownership and the approaches used to identify domains, domain-driven design, the roles associated with a data domain, tools to consider, as well as an approach to domain-centric chargebacks.
- explores the creation of streaming data products, including data product identification, ingestion, transformation, and publication of data products.
- examines Federated Computational Data Governance within a streaming data mesh.
- discusses the self-service infrastructure as it relates to streaming data mesh.
- dives into the architecture of a streaming data mesh and its components, including infrastructure and cloud architecture.
- discusses the structure, alignment, and roles associated with building a decentralized team.
- discusses the application of streaming data mesh for creating feature stores to empower data science model training and inference.
- provides a concrete example of creating a streaming data mesh.
Conventions Used in This Book
The following typographical conventions are used in this book:
ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
Tip
This element signifies a tip or suggestion.
Note
This element signifies a general note.
Warning
This element indicates a warning or caution.
Using Code Examples
Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/oreillymedia/title_title