Praise for Data Quality Engineering in Financial Services
This book is an essential reading not only for the data management specialists but for anyone who works with and relies on data. Brian Buzzelli harnesses his many years of practical, been there, done that, have scars to prove it experience to teach the reader how to apply manufacturing quality control principles to find a needle in a haystackthat one erroneous attribute that will have an outside impact.
Julia Bardmesser, SVP, Head of Data, Architecture and Salesforce Development, Voya Financial
This is the perfect playbook that, if implemented, will allow any financial services company to put their data on an offensive footing to drive alpha and insights without sacrificing quality, governance, or compliance.
Michael McCarthy, Principal Investment Data Architect, Investment Data Management Office, MFS
The approach to data quality expressed in this book is based on an original idea of using quality and standardization principles applied from manufacturing. It provides insights into a pragmatic and tested data quality framework that will be useful to any data practitioner.
Predrag Dizdarevic, Partner, Element22
This book clearly explains how to apply a manufacturing approach to data quality, provides an easy framework to capture data quality requirements, and has high-impact data quality metrics and visualization.
Alag Solaiappan, VP, Data Engineering, Acadian Asset Management
This book is a must for any data professional, regardless of industry. Brian has provided a definitive guide on how to best ensure that data processesfrom sourcing and ingestion, to firmwide utilizationare properly monitored, measured and controlled. The insights that he illustrates are born out of a long history of working with content and enabling financial professionals to perform their jobs. The principles presented herein are applicable to any organization that needs to build proper and efficient data governance and data management. Finally, here is a tool that can help everyone from chief data officers to data engineers in the performance of their roles.
Barry S. Raskin, Head of Data Practice, Relevate Data Monetization Corp.
Brian Buzzelli presents a clear howto guide for the finance professional to motivate, design, and implement a comprehensive data quality framework. Even in early stages, the data quality program will improve efficiency, reduce risk, and build trust with clients and across functions. Brian demonstrates the connection between data integrity and fiduciary obligation with relevant examples. Borrowing unabashedly from concepts in high precision manufacturing, Brian provides a step-by-step plan to engineer an enterprise level data quality program with solutions designed for specific functions. The code examples are especially applicable, providing the reader with a set of practical tools. I believe these concepts are an important contribution to the field.
Matthew Lyberg, CFA, Quantitative Researcher, NDVR Inc.
Data Quality Engineering in Financial Services
by Brian Buzzelli
Copyright 2023 Brian Buzzelli. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
- Acquisitions Editor: Michelle Smith
- Development Editor: Corbin Collins
- Production Editor: Beth Kelly
- Copyeditor: Nicole Tach
- Proofreader: Shannon Turlington
- Indexer: Potomac Indexing, LLC
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Kate Dullea
- October 2022: First Edition
Revision History for the First Edition
- 2022-10-19: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098136932 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Data Quality Engineering in Financial Services, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the author and do not represent the publishers views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-098-13687-1
[LSI]
Preface
Most people would say we live in a world where we trust in the manufacturing discipline and quality standards used to provide the food we eat, the water we drink, the medications we take, and the sophisticated technology products we use in our daily lives. We can appreciate the years of evolution in science, refinement in manufacturing techniques, and codification of product specifications that form the basis of the trust we enjoy in consuming and using physical products today. Given the monumental achievements in science, technology, and manufacturing; what then is so different about the data used in the financial industry whereby data and information must be constantly checked, rechecked, and reconciled to ensure its accuracy and quality?
Data is the fundamental raw material used in the financial industry to manage your retirement and familys wealth assets, provide operating and growth capital to companies, and drive the global financial system as the life blood of the global economy. Unlike the manufacturing industry, data flows in the financial industry have evolved from being based on open outcry, telephone, paper trails, and ticker tapes, to being grounded in sophisticated and complex computational, artificial intelligence, and machine learning applications. We capture, store, and pass along data through complex applications, and we use data in business processes with a general assumption that the data is reliable and suitable for use.
However, data has no physical form and has the capacity to be infinitely malleable. By contrast, the raw materials in manufacturing have physical form. The physical properties can be measured and assessed for suitability based on the specification for the physical properties and tolerances for which the raw material is certified compliant for use. This is one of the key concepts whereby we will apply a similar manufacturing framework to data and define the properties of data that can be measured against a specification. Examples of data will be presented as if it has mass and physical form, but in the context of measurable data dimensions: completeness, timeliness, accuracy, precision, conformity, congruence, collection, and cohesion.
The premise in this book is that data has shape, has measurable dimensions, and can be inspected and measured relative to data quality specifications and tolerances that yield data quality metrics. The results can then be analyzed using data quality specifications to derive data quality metrics. The data quality processes in manufacturing include evaluating specific measurements of physical materials relative to control specifications. The results are analyzed to determine whether the materials quality measurements and metrics are within design specifications and acceptable tolerances.