ALLEN LANE
an imprint of Penguin Canada, a division of Penguin Random House Canada Limited
Canada USA UK Ireland Australia New Zealand India South Africa China
Published in Allen Lane hardcover by Penguin Canada, 2018
Simultaneously published in the United States by Penguin Press, an imprint of Penguin Random House LLC, 375 Hudson Street, New York, New York 10014
Copyright 2018 by Christopher Clearfield and Andrs Tilcsik
Illustrations by Anton Ioukhnovets. Copyright 2018 by Christopher Clearfield and Andrs Tilcsik.
All rights reserved. Without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior written permission of both the copyright owner and the above publisher of this book.
www.penguinrandomhouse.ca
LIBRARY AND ARCHIVES CANADA CATALOGUING IN PUBLICATION
Clearfield, Christopher, author
Meltdown : why our systems fail
and what we can do about it / Christopher Clearfield and Andrs Tilcsik.
Issued in print and electronic formats.
ISBN 9780735233324 (hardcover).ISBN 9780735233331 (electronic)
1. DisastersCase studies. 2. Failure (Psychology)Case studies.
3. Risk managementCase studies. I. Tilcsik, Andras, author II. Title.
D24.C45 2018 303.485 C2017-904431-1
C2017-904432-X
Book design by Nicole Laroche (adapted for ebook)
Cover image: Jonathan Kantor/Getty Images
Version_1
To whistleblowers, strangers, and leaders who listen. We need more of you.
To Linna, Torvald, and Soren
CHRIS CLEARFIELD
To my parents and Marvin
ANDRS TILCSIK
CONTENTS
meltdown / 'mltdan / noun
1: an accident in a nuclear reactor in which the fuel overheats and melts the reactor core; may be caused by earthquakes, tsunamis, reckless testing, mundane mistakes, or even just a stuck valve
2: collapse or breakdown of a system
Prologue
A DAY LIKE ANY OTHER
It was the quotation marks around empty that got me.
I.
It was a warm Monday in late June, just before rush hour. Ann and David Wherley boarded the first car of Metro Train 112, bound for Washington, DC, on their way home from an orientation for hospital volunteers. A young woman gave up her seat near the front of the car, and the Wherleys sat together, inseparable as they had been since high school. David, sixty-two, had retired recently, and the couple was looking forward to their fortieth wedding anniversary and a trip to Europe.
David had been a decorated fighter pilot and Air Force officer. In fact, during the 9/11 attacks, he was the general who scrambled fighter jets over Washington and ordered pilots to use their discretion to shoot down any passenger plane that threatened the city. But even as a commanding general, he refused to be chauffeured around. He loved taking the Metro.
At 4:58 p.m., a screech interrupted the rhythmic click-clack of the wheels as the driver slammed on the emergency brake. Then came a cacophony of broken glass, bending metal, and screams as Train 112 slammed into something: a train inexplicably stopped on the tracks. The impact drove a thirteen-foot-thick wall of debrisa mass of crushed seats, ceiling panels, and metal postsinto Train 112 and killed David, Ann, and seven others.
Such a collision should have been impossible. The entire Washington Metro system, made up of over one hundred miles of track, was wired to detect and control trains. When trains got too close to each other, they would automatically slow down. But that day, as Train 112 rounded a curve, another train sat stopped on the tracks aheadpresent in the real world, but somehow invisible to the track sensors. Train 112 automatically accelerated; after all, the sensors showed that the track was clear. By the time the driver saw the stopped train and hit the emergency brake, the collision was inevitable.
As rescue workers pulled injured riders from the wreckage, Metro engineers got to work. They needed to make sure that other passengers werent at risk. And to do that, they had to solve a mystery: How does a train twice the length of a football field just disappear?
II.
Alarming failures like the crash of Train 112 happen all the time. Take a look at this list of headlines, all from a single week:
CATASTROPHIC MINING DISASTER IN BRAZIL
ANOTHER DAY, ANOTHER HACK: CREDIT CARD STEALING MALWARE HITS HOTEL CHAIN
HYUNDAI CARS ARE RECALLED OVER FAULTY BRAKE SWITCH
STORY OF FLINT WATER CRISIS, FAILURE OF GOVERNMENT, UNFOLDS IN WASHINGTON
MASSIVE INTELLIGENCE FAILURE LED TO THE PARIS TERROR ATTACKS
VANCOUVER SETTLES LAWSUIT WITH MAN WRONGFULLY IMPRISONED FOR NEARLY THREE DECADES
EBOLA RESPONSE: SCIENTISTS BLAST DANGEROUSLY FRAGILE GLOBAL SYSTEM
INQUEST INTO MURDER OF SEVEN-YEAR-OLD HAS BECOME SAGA OF THE SYSTEMS FAILURE TO PROTECT HER
FIRES TO CLEAR LAND SPARK VAST WILDFIRES AND CAUSE ECOLOGICAL DISASTER IN INDONESIA
FDA INVESTIGATES E. COLI OUTBREAK AT CHIPOTLE RESTAURANTS IN WASHINGTON AND OREGON
It might sound like an exceptionally bad week, but there was nothing special about it. Hardly a week goes by without a handful of meltdowns. One week its an industrial accident, another its a bankruptcy, and another its an awful medical error. Even small issues can wreak great havoc. In recent years, for example, several airlines have grounded their entire fleets of planes because of glitches in their technology systems, stranding passengers for days. These problems may make us angry, but they dont surprise us anymore. To be alive in the twenty-first century is to rely on countless complex systems that profoundly affect our livesfrom the electrical grid and water treatment plants to transportation systems and communication networks to healthcare and the law. But sometimes our systems fail us.
These failuresand even large-scale meltdowns like BPs oil spill in the Gulf of Mexico, the Fukushima nuclear disaster, and the global financial crisisseem to stem from very different problems. But their underlying causes turn out to be surprisingly similar. These events have a shared DNA, one that researchers are just beginning to understand. That shared DNA means that failures in one industry can provide lessons for people in other fields: dentists can learn from pilots, and marketing teams from SWAT teams. Understanding the deep causes of failure in high-stakes, exotic domains like deepwater drilling and high-altitude mountaineering can teach us lessons about failure in our more ordinary systems, too. It turns out that everyday meltdownsfailed projects, bad hiring decisions, and even disastrous dinner partieshave a lot in common with oil spills and mountaineering accidents. Fortunately, over the past few decades, researchers around the world have found solutions that can transform how we make decisions, build our teams, design our systems, and prevent the kinds of meltdowns that have become all too common.
This book has two parts. The first explores why our systems fail. It reveals that the same reasons lie behind what appear to be very different events: a social media disaster at Starbucks, the Three Mile Island nuclear accident, a meltdown on Wall Street, and a strange scandal in small-town post offices in the United Kingdom. Part One also explores the paradox of progress: as our systems have become more capable, they have also become more complex and less forgiving, creating an environment where small mistakes can turn into massive failures. Systems that were once innocuous can now accidentally kill people, bankrupt companies, and jail the innocent. And Part One shows that the changes that made our systems vulnerable to accidental failures also provide fertile ground for intentional wrongdoing, like hacking and fraud.