The Ethical Algorithm
Oxford University Press is a department of the University of Oxford. It furthers the Universitys objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and certain other countries.
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America.
Michael Kearns and Aaron Roth 2020
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms agreed with the appropriate reproduction rights organization. Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above.
You must not circulate this work in any other form and you must impose this same condition on any acquirer.
Library of Congress Cataloging-in-Publication Data
Names: Kearns, Michael, 1971 author. | Roth, Aaron (Writer on technology), author.
Title: The ethical algorithm : the science of socially aware algorithm design / Michael Kearns and Aaron Roth.
Description: New York : Oxford University Press, 2019. |
Includes bibliographical references and index. |
Identifiers: LCCN 2019025725 |
ISBN 9780190948207 (hardback) | ISBN 9780190948221 (epub)
Subjects: LCSH: Information technologyEconomic aspects. |
Technological innovationsMoral and ethical aspects.
Classification: LCC HC79.I55 K43 2019 | DDC 174/.90051dc23
LC record available at https://lccn.loc.gov/2019025725
1 3 5 7 9 8 6 4 2
Printed by Sheridan Books, Inc. United States of America
Dedicated to our families
MK: Kim, Kate, and Gray
AR: Cathy, Eli, and Zelda
Contents
The Ethical Algorithm
Algorithmic Anxiety
We are allegedly living in a golden age of data. For practically any question about people or society that you might be curious about, there are colossal datasets that can be mined and analyzed to provide answers with statistical certainty. How do the behaviors of your friends influence what you watch on TV, or how you vote? These questions can be answered with Facebook data, which records the social network activity of billions of people worldwide. Are people who exercise frequently less likely to habitually check their email? For anyone who uses an Apple Watch, or an Android phone together with the Google Fit app, the data can tell us. And if you are a retailer who wants to better target your products to your customers by knowing where and how they spend their days and nights, there are dozens of companies vying to sell you this data.
Which all brings us to a conundrum. The insights we can get from this unprecedented access to data can be a great thing: we can get new understanding about how our society works, and improve public health, municipal services, and consumer products. But as individuals, we arent just the recipients of the fruits of this data analysis: we are the data, and it is being used to make decisions about ussometimes very consequential decisions.
In December 2018, the New York Times obtained a commercial dataset containing location information collected from phone apps whose nominal purpose is to provide mundane things like weather reports and restaurant recommendations. Such datasets contain precise locations for hundreds of millions of individuals, each updated hundreds (or even thousands) of times a day. Commercial buyers of such data will generally be interested in aggregate informationfor example, a hedge fund might be interested in tracking the number of people who shop at a particular chain of retail outlets in order to predict quarterly revenues. But the data is recorded by individual phones. It is superficially anonymous, without names attachedbut there is only so much anonymity you can promise when recording a persons every move.
For example, from this data the New York Times was able to identify a forty-six-year-old math teacher named Lisa Magrin. She was the only person who made the daily commute from her home in upstate New York to the middle school where she works, fourteen miles away. And once someones identity is uncovered in this way, its possible to learn a lot more about them. The Times followed Lisas data trail to Weight Watchers, to a dermatologists office, and to her ex-boyfriends home. She found this disturbing and told the Times why: Its the thought of people finding out those intimate details that you dont want people to know. Just a couple of decades ago, this level of intrusive surveillance would have required a private investigator or a government agency; now it is simply the by-product of widely available commercial datasets.
Clearly, we have entered a brave new world.
And its not only privacy that has become a concern as data gathering and analysis proliferate. Because algorithmsthose little bits of machine code that increasingly mediate our behavior via our phones and the Internetarent simply analyzing the data that we generate with our every move. They are also being used to actively make decisions that affect our lives. When you apply for a credit card, your application may never be examined by a human being. Instead, an algorithm pulling in data about you (and perhaps also about people like you) from many different sources might automatically approve or deny your request. Though there are benefits to knowing instantaneously whether your request is approved, rather than waiting five to ten business days, this should give us a moment of pause. In many states, algorithms based on what is called machine learning are also used to inform bail, parole, and criminal sentencing decisions. Algorithms are used to deploy police officers across cities. They are being used to make decisions in all sorts of domains that have direct and real impact on peoples lives. All this raises questions not only of privacy but also of fairness, as well as a variety of other basic social values including safety, transparency, accountability, and even morality.
So if we are going to continue to generate and use huge datasets to automate important decisions (a trend whose reversal seems about as plausible as our returning to an agrarian society), we have to think seriously about some weighty topics. These include limits on the use of data and algorithms, and the corresponding laws, regulations, and organizations that would determine and enforce those limits. But we must also think seriously about addressing the concerns scientificallyabout what it might mean to encode ethical principles directly into the design of the algorithms that are increasingly woven into our daily lives. This book is about the emerging science of ethical algorithm design, which tries to do exactly that.
Sorting through Algorithms
But first, what is an algorithm anyway? At its most fundamental level, an algorithm is nothing more than a very precisely specified series of instructions for performing some concrete task. The simplest algorithmsthe ones we teach to our first-year computer science studentsdo very basic but often important things, such as sorting a list of numbers from smallest to largest. Imagine you are confronted with a row of a billion notecards, each of which has an arbitrary number written on it. Your goal is to rearrange the notecards so that the numbers are in ascending orderor, more precisely, to specify an algorithm for doing so. This means that each step of the process you describe must be unambiguous, and that the process must always terminate with the notecards arranged in ascending order, regardless of the numbers and their initial arrangement.