Practical Data Privacy
by Katharine Jarmul
Copyright 2023 Kjamistan Inc. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
- Editors: Andy Kwan and Rita Fernando
- Production Editor: Kristen Brown
- Copyeditor:
- Proofreader:
- Indexer:
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Kate Dullea
- April 2023: First Edition
Revision History for the Early Release
- 2022-07-20: First Release
- 2022-09-01: Second Release
- 2022-11-21: Third Release
- 2022-12-09: Fourth Release
- 2023-01-27: Fifth Release
- 2023-03-02: Sixth Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098129460 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Practical Data Privacy, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the author and do not represent the publishers views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-098-12946-0
[LSI]
Foreword
Given the multitude of benefits that come with digital connectivity, it is not always apparent that waves of futuristic tech have also brought an undertow of drawbacks. Instant messaging, biometric scanning, real-time motion-tracking, digital payments, and more were, after all, the stuff of sci-fi fantasies. For those of us who work in technology (or just consume it), the cool factor of integrating digital tools into our daily routines is difficult to deny.
But the other side of digitally-connected living is the right to unplug. To hear some first-generation tech millionaires tell it, preventing their kids from online time at home and at school is highly desirable. That may sound strange, if youre used to hearing about the digital divide as a chasm between people with multiple Apple products vs. have-nots who lack 24/7 high-speed internet. With so many of our everyday interactions having gone digital, its a challenge for most of us to function without unlimited online access.
Using digital tools and accessing online spaces is sold to us today the same way it was at the dawn of the internet: as a drop-in experience thats completely voluntary and fun. But nothing is fun about an internet experience that feels like a stay at the Hotel California you can check out any time you want, but you can never leave. Nothing is fair about an online world that restricts your offline life in terms of what you can see, do, and how you might be treated. The idea that we are choosing to drop in on the internet world for a casual set of interactions is no longer true: if anything, were often obliged to navigate a highway jam-packed with data about ourselves and others.
Many of us incorrectly assume that our data is uninteresting to anyone else. But thats when we dont see the full picture of how todays apps and algorithms hoard our data to connect where we live, what we earn, who we date, and whether weve had mental health problems or a sexually transmitted infection. Thats when we dont realize that the predictive function of algorithms is usually used to profile us using data that weve willingly and also unknowingly provided, so as to sell us (or prevent us from accessing) financial products, insurance coverage, jobs, homes, or potential romantic partners.
Digital connectivity is supposed to be fun, not reminiscent of being criminally tracked. But near-criminal tracking is the shopping experience that Ive had in the real world since I was a child in New York City: it was typically anything but pleasant to shop or find a taxi as a visible minority then. I know very well the feeling of being scanned, surveilled, and singled out from a group. This is what one tech expos after another shows us: that having our private, personal, and permanent data being hoovered up into profiles and passed to data brokers, governments and law enforcement destroys our privacy. Just as it does for convicted criminals.
For those who havent considered it much, privacy is like access to good credit or a good lawyer something better to have and not need, than to need and not have. It should not take a biometric data shakedown while boarding an airplane (which I had to protest recently in San Francisco) to recognize that our personal data is too often collected without our consent or understanding. It should not require a person from a racial minority group to flag a data-driven health or financial algorithm as being discriminatory. For those of us working in tech, it shouldnt require lawsuits, corporate fines, and government regulation to see that systems that all but forcibly extract our data leave us without privacy or choice. And as for adopting Neo-Luddite measures to protect ones privacy by staying offline? Much like having good credit and a good lawyer, preserving personal data privacy has become the new privilege of the wealthy.
That divide may be the most glaring problem of our digitally-connected lives. If we ever want to return to a digital world that we can choose to drop in on we will need to limit the degree to which digital systems extend their tentacles to us offline. Giving people back the right to browse anonymously or to announce themselves online means reining in the data-collection mechanisms that currently drive most digital systems. With Practical Data Privacy, Ms. Jarmul offers tested techniques for building an online world very different from what we have today. Her real-life examples prove that you do not need to be a privacy engineer in order to meaningfully engineer privacy.
I hope that everyone who worries about algorithmic discrimination and ethical technology will read this book. Moreover, I encourage anyone who designs, engineers, or tests digital systems to decide for yourselves if privacy is the component that separates the online experiences that we have, from the ones we want and need.
Dr. Nakeema Damali Stefflbauer
CEO, FrauenLoop and Global AI Ethics lecturer, Stanford University
Preface
Welcome to the wonderful world of data privacy! You might have some preconceived notions around privacythat it is a nuisance, that it is administrative and therefore boring, or that its a topic that only interests lawyers. What this book will show you is just how technically challenging and interesting data privacy problems are and will continue to be for years to come. If you entered the field of data science because you liked challenging mathematical and statistical problems, you will love exploring data privacy in data science. The topics youll learn in this book will expand your understanding of probability theory, modeling and even cryptographyand when you need assistance from legal professionals.