Practical Data Privacy
by Katharine Jarmul
Copyright 2023 Kjamistan Inc. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
- Editors: Andy Kwan and Rita Fernando
- Production Editor: Katherine Tozer
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Kate Dullea
Revision History for the Early Release
- 2022-07-20: First Release
- 2022-09-01: Second Release
- 2022-11-21: Third Release
- 2022-12-09: Fourth Release
- 2023-01-27: Fifth Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098129460 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Practical Data Privacy, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the author and do not represent the publishers views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-098-12817-3
Preface
Welcome to the wonderful world of data privacy! You might have some preconceived notions around privacythat it is a nuisance, that it is administrative and therefore boring, or that its a topic that only interests lawyers. What this book will show you is just how technically challenging and interesting data privacy problems are and will continue to be for years to come. If you entered the field of data science because you liked challenging mathematical and statistical problems, you will love exploring data privacy in data science. The topics youll learn in this book will expand your understanding of probability theory, modeling and even cryptographyand when you need assistance from legal professionals.
Learning how to solve data privacy problems is increasingly critical for data science practitioners today. Youll be able to solve real world problems in the fields like cybersecurity, health care and finance, and to advance your career in a patchwork world of privacy regulations, policies and frameworks. Since 2018 when the General Data Protection Regulation (GDPR) went into effect in Europe, the global landscape has become more complicated, and that complexity will increase as regulatory agencies and lawmakers continue to change the rules about how, where, why and when you store data. Building up your data privacy and data security skill set now is an investment in your future career.
Additionally, taking the time to learn new privacy skills means you are contributing to our field in terms of trust, accountability, understanding and social responsibility. Currently, there is fear and backlash against the use of machine learning to solve real world problems. This response is based in real issues and actual deployments, where data, models and systems were not used in a trustworthy manner and where justice and fairness come into question. For example, Clearview AI scrapes faces from social media sites and sells the facial recognition model built from those faces to law enforcement, raising questions regarding data ownership, privacy and fairness. To help counter this reputational damage and to create pathways for responsible and trustworthy data, the industry needs data scientists and machine learning engineers who understand the tasks at hand, the risks involved and who can competently address these issue when designing systems. Privacy can help guide you to fairer, more ethical and responsible systems, where the user has power and input and is at the center of your design. Use this book as you navigate these challenges, finding ways forward with practical, hands-on guidance.
I hope this book can contribute to new data science by expanding familiarity with how to appropriately implement privacy for sensitive data. Worldwide, apprehension around digitizing personal dataeven for responsible government useare so prevalent that they obstruct the use of data to provide assistance with social problems like climate change, financial auditing and global health crises. Building privacy into data science creates new pathways for data use in critical decisions for our societies and world.
What is Data Privacy?
In a simple sense, data privacy aims to protect data by enabling and guaranteeing more privacy for that data via access, use, processing and storage controls. Usually this data is person-related, but it can apply to many types of processing. This definition, however, doesnt fully cover the world of data privacy.
Data privacy is a complex conceptwith aspects from many different areas of our world: legal, technical, social and individual. Lets explore these aspects and how they overlap, so you get an idea of the vast implications of the topics and practices you will learn in this book.
Figure P-1. Privacy Definitions
In Figure P-1, you can see the different definitions of privacyand Ive tried to represent their respective size in the figure. Lets walk through themstarting with legal definitions.
Its about a collective understanding of a social situations boundaries and knowing how to operate within them. In other words, its about having control over a situation. Its about understanding the audience and knowing how far information will flow. Its about trusting the people, the situating, and the context.
danah boyd, Privacy and Publicity in the Context of Big Data, 2010
The scientific or technical definitions of privacy and their implementations in your daily work are the focus of this book. You will learn these definitions, how to deploy scientific privacy technologies at scale and how to make technical decisions on privacy. With the tools in this book, you will learn state-of-the-art best practices that might not yet be well-known at your organization as they are only recently available in production systems. Staying up-to-date on these practices will be a part of your jobshould you decide that this area is your focus. As a technical expert on the topic, you will be asked to support business and legal decisions on privacy and translate them into working software and systems. This is significant role as many of the other stakeholders will not only not have a technical and up-to-date understanding of privacy.
The social and cultural aspects of privacy are best explained by danah boyds work in data privacy. She studied teenage girls and their interaction with social media to understand how technology impacted their understanding of concepts like privacy. Her definition is as follows: