Data Mesh
by Zhamak Dehghani
Copyright 2022 Zhamak Dehghani. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Acquisitions Editor: Melissa Duffield
- Development Editor: Gary OBrien
- Production Editor: Beth Kelly
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Kate Dullea
- April 2022: First Edition
Revision History for the Early Release
- 2021-06-18: First Release
- 2021-07-28: Second Release
- 2021-09-07: Third Release
- 2021-10-07: Fourth Release
- 2021-12-21: Fifth Release
- 2022-02-10: Sixth Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492092391 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Data Mesh, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
This work is part of a collaboration between OReilly and Starburst Data. See our statement of editorial independence.
978-1-492-09232-2
[FILL IN]
Foreword
Ive been involved in developing software for large corporations for several decades, and managing data has always been a major architectural issue. In the early days of my career, there was a lot of enthusiasm for a single enterprise-wide data model, often stored in a single enterprise-wide database. But we soon learned that having a plethora of applications accessing a shared data store was a disaster of ad-hoc coupling. Even without that, deeper problems existed. Core ideas to an enterprise, such as a customer, required different data models in different business units. Corporate acquisitions further muddied the waters.
As a response, wiser enterprises have decentralized their data, pushing data storage, models, and management into different business units. That way, the people who best understand the data in their domain are responsible for managing that data. They collaborate with other domains through well-defined APIs. Since these APIs can contain behavior, we have more flexibility for how that data is shared and more importantly, how we evolve data management over time.
While this has been increasingly the way to go for day-to-day operations, data analytics has remained a more centralized activity. Data warehouses aimed to provide an enterprise repository of curated critical information. But such a centralized group struggled with the work and its conflicting customers, particularly since they didnt have a good understanding of the data or the needs of its consumers. A data lake helped by popularizing access to raw data, allowing analysts to get closer to original source, but too easily became a data swamp of poor understanding and provenance.
Data mesh seeks to apply the same lessons we learned with operational data to the world of analytical data. Business unit domains become responsible for publishing analytical data through APIs the same way they do for operational data. By treating their data as a first class product, they communicate the meaning and provenance of the data, and they collaborate with their consumers. To make the work involved in this achievable, the enterprise needs to provide a platform for building and publishing these data products, together with a federated governance structure to keep it all coherent. Pervading all of this is a recognition of the importance of technical excellence so that the platforms and products can evolve swiftly as business needs change.
Data mesh is thus at heart a rather simple, perhaps obvious, application of a well-established data management principle to the world of analytical data. In practice, however, theres a great deal involved in making this work, particularly since so much vendor investment has focused on centralized models, exacerbated by not supporting the practices (such as testing, abstraction building, and refactoring) that developers of operations systems know are essential for healthy software.
Zhamak has been at the sharp end of this, advising our clients on the path forward, learning from their setbacks and triumphs, and nudging vendors into producing the tools to make it easier to build these platforms. This book collects her and her colleagues knowledge in this early but important stage of the adoption of data meshes worldwide. Ive learned a lot about these pragmatic difficulties while reviewing this book, and Im convinced that anyone who wants their organization to best utilize their data resources will find this book charts out the best we understand of the path forward.
Martin Fowler
Chief Scientist, Thoughtworks
Prologue: Imagine Data Mesh
Imagination will often carry us to worlds that never were. But without it we go nowhere.
Carl Sagan
Behind every successful company stand three, failed and forgotten. This is a ratio by which the failures outnumber the survivors. In the age of AI, it is no curious coincidence that the ones standing and leading have cracked the code of complexity, embedded data-driven experimentation in every aspect of their business, embraced continuous change in response to rapid learnings, and partnered with machine intelligence to understand reality beyond human logic and reasoning.
Daff, is an example of such a company. Daff has successfully delivered on its mission: Connect artists and listeners across the globe, in an immersive artistic experience, at every moment of life. Behind Daffs mission stands the companys great expectations from data, analytics, and machine intelligence, delivered through an approach known as data mesh. Data mesh is the backbone of Daffs data strategy, architecture, and operating model that has given them scale and speed to experiment, learn, and adapt using data and machine learning (ML).
What I want to share with you is Daffs story after they have implemented data mesh. Through the story of Daff you will learn the essence of data mesh. You will see data mesh principles applied, its benefits demonstrated, its architecture in action, and the organizational structure, up and running.
I find the best way to introduce a complex phenomenon such as data mesh is with an example. However, its too early in the life of data mesh to describe an example of a company with a mature data mesh, as we are currently in the process of building the first data meshes. Therefore, Im describing a fictional organization that exhibits the characteristics I would expect to see in a few years time. While we dont expect that reality will conform to our imagination, our vision of what were working towards is a vital part of understanding what we are trying to achieve. To best convey this picture, Im writing about this fictional company as I would imagine it being featured in the business press.