Copyright 2012 New Hamilton
All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means without the express written permission of the publisher.
ISBN-13: 978-0615723082
ISBN-10: 061572308X
www.NewHamilton.com
Modeling the
Agile Data Warehouse
with Data Vault
Hans Hultgren
Forward
Art. Design. Creativity.
Interesting three words to begin a book about a data modeling approach. These words speak to a creative process as though targeting artists or designers. And in fact this is true. The process of designing a data warehouse or creating a data model is more art than science. Those of us working with the data vault today are designers, architects and modelers tasked with creating new and innovative models and patterns.
Agility. Clarity. Harmony.
These words speak to current day business objectives. They encapsulate a state we wish to obtain as an organization. A state that requires improved business processes and a common understanding of the organizations integrated data. These are the goals of our enterprise data warehouse; agility to absorb new data sources and respond to changes in the business, clarity concerning the origin and transformation of data, and harmony concerning the meaning and context of the corporate data asset.
Integration. Flexibility. Alignment.
The data warehouse should provide for the integration of data from disparate sources over time. At the same time it should provide the flexibility to adapt to new sources and new downstream demands quickly and easily. In addition, it should be aligned with core business concepts as defined and governed by the organization. Where business concepts do not fully align, the data warehouse needs to present the anomalies and reconcile the differences. The Data Vault modeling approach is today an important component for meeting these goals.
The preceding points represent a high level theme for this book. We seek to address what we do, why we do it and then consider how we should do it. This book will carry forward these concepts while presenting data vault modeling principles applied to the modern data warehouse.
The modern data warehouse is as much of a theme in this book as data modeling. Today's enterprise data warehouse is defined by several factors that have helped us to build better data warehouse solutions for the organization. Data vault modeling represents one of these factors. Note that data vault modeling is by no means a revolutionary concept. A paradigm shift in many ways, but the concepts are definitely evolutionary. Our thoughts about modeling the data warehouse have evolved as part of the evolution of data warehousing in general. We also expect that data warehousing will continue to evolve into the foreseeable future. Where some of these changes are already on the horizon, this book will discuss them. There are several of these covered in the book including, but not limited to, the introduction of Unified Decomposition, Ensemble Modeling, and Concept Constellations.
The audience for this book includes anyone in the organization concerned with leveraging corporate data. As such, this book targets a broad audience that includes both business and technical professionals. Specifically included are business unit managers, business analysts, business intelligence professionals, data warehouse designers, information modelers, data architects, data modelers, and data integration professionals.
About the Author
Hans Hultgren is an entrepreneur, educator, advisor and independent analyst by career; a designer, modeler and architect by profession. He has started several companies, worked in academia for the better part of 20 years, served as an advisor to several dozen organizations concerning business development and data warehousing matters, and is an active analyst in the business intelligence space.
For the half decade preceding this book he has been focused on building an optimal training model for blended online and classroom education. Today Hans is President of Genesee Academy, a global leader in hybrid training programs primarily within the data warehousing and business intelligence space. Genesee Academy provides the training and certification for data vault modelers around the world and also provides training and certification for other data modeling, data warehousing and business intelligence areas.
Hans was born in Sweden, raised in New Jersey, and currently lives with his family in Golden, Colorado and in Stockholm.
For current information and book updates hanshultgren.wordpress.com
Acknowledgements
There are several people to recognize concerning this particular book. First and foremost my family. I am convinced that writing a book without support of family is really impossible. And in my case I had more than support I had cheerleaders, task masters, and friends. Thank you.
The author acknowledges the founder of the data vault principles, Dan Linstedt. Since the writing of his first set of white papers on this topic the industry has never been the same. And Bill Inmon, the father of data warehousing, has been a strong supporter and good friend throughout my decades of submersion in this field.
The author would like to recognize people who have had a significant impact on the continuing development of data vault principles. Specifically Ronald Damhof, Tom Breur, Martijn Evers, Lars Bostrm and Niklas Hultgren, to name a few. Today data vault is the data vault community. And the full list of contributors to its ongoing development is a very long list.
Special thanks go to friends and industry colleagues Tjaart Riekert, Erik Fransen, Shawn Rogers, Remco Broekmans, Steve Hitchman, Krish Krishnan, Stephen Brobst, Patrik Lager, Al Messerli, Jill Dyche and Claudia Imhoff along with the entire team at the BBBT. Thanks also to the many teams bringing data vault to the industry every day especially Centennium BI Expertisehuis and Top of Minds AB.
Finally the author recognizes the following people who graciously served as reviewers and editors of this book: Ronald Damhof, Tom Breur, Patrik Lager, Niklas Hultgren, Steve Hitchman, Remco Broekmans, Patrik Ekstrm and Tjaart Riekert.
Contents
Section I
Data Vault Ensemble
CHAPTER 1
Data Vault Defined
1.1 Data Vault is a Data Modeling Approach
Data Vault is a data modeling approach typically used to design a data warehouse. More specifically, an approach used to design the tables and relationships for the underlying database of the data warehouse. Other common data modeling approaches have also been applied for this purpose. However, where 3rd Normal Form (3NF) works well for operational systems and Star Schema (Dimensional modeling) works well for data marts, Data Vault is especially useful for modeling the Enterprise Data Warehouse (EDW).
Data Vault modeling is very effective for modeling a data warehouse because it is optimized for integration, historization, and agility requirements. This is primarily due to the decomposition of traditional table structures into a set of flexible component parts. These parts are simple structures with clearly defined roles. And it is important to commit to these structures and their roles. The benefits of data vault modeling are fully realized when consistently applying the modeling patterns defined as part of the data vault modeling approach. It is this pattern-based feature that makes data vault modeling both repeatable & maintainable.