Contents
Landmarks
Print Page List
Copyright 2023 by Matthew Connelly
All rights reserved.
Published in the United States by Pantheon Books, a division of Penguin Random House LLC, New York, and distributed in Canada by Penguin Random House Canada Limited, Toronto.
Pantheon Books and colophon are registered trademarks of Penguin Random House LLC.
Library of Congress Cataloging-in-Publication Data
Name: Connelly, Matthew James, author.
Title: The declassification engine : what history reveals about Americas top secrets / Matthew Connelly.
Description: First Edition. New York : Pantheon Books, 2023. Includes bibliographical references and index.
Identifiers: LCCN 2022019182 (print). LCCN 2022019183 (ebook). ISBN 9781101871577 (hardcover). ISBN 9781101871584 (ebook).
Subjects: LCSH : Transparency in governmentUnited States. Government informationAccess controlUnited States. Public administrationUnited States. United StatesPolitics and government.
Classification: LCC JK 468. S 4 C 656 2023 (print) | LCC JK 468. S 4 (ebook) | DDC 352.3/790973dc23/eng/20220803
LC record available at https://lccn.loc.gov/2022019182
LC ebook record available at https://lccn.loc.gov/2022019183
Ebook ISBN9781101871584
www.pantheonbooks.com
Cover design by Tyler Comrie
ep_prh_6.0_142459018_c0_r0
For Sarah,
who sees right through me,
keeps my secrets,
and solves my mysteries.
Contents
_142459018_
PREFACE
Should This Book Be Legal?
There I was, sitting at a massive conference table inside a multibillion-dollar foundation, staring at the wood-paneled walls. I was facing a battery of high-powered attorneys, including the former general counsel to the National Security Agency, and another who had been chief of the Major Crimes Unit at the U.S. Attorneys Office in the Southern District of New York. The foundation was paying each of them about a thousand dollars an hour to determine whether I could be prosecuted under the Espionage Act.
I am a history professor, and my only offense had been to apply for a research grant. I proposed to team up with data scientists at Columbia University to investigate the exponential growth in government secrecy. Earlier that year, in 2013, officials reported that they had classified information more than ninety-five million times over the preceding twelve months, or three times every second. Every time one of these officials decided that some transcript, or e-mail, or PowerPoint presentation was confidential, secret, or top secret, it became subject to elaborate protocols to ensure safe handling. No one without a security clearance would see these records until, decades from now, other government officials decided disclosure no longer endangered national security. The cost of keeping all these secrets was growing year by year, covering everything from retinal scanners to barbed-wire fencing to personnel training programs, and already totaled well over eleven billion dollars. But so, too, were the number and size of data breaches and leaks. At the same time, archivists were overwhelmed by the challenge of managing just the first generation of classified electronic records, dating to the 1970s. Charged with identifying and preserving the subset of public records with enduring historical significance but with no increase in staff or any new technology, they were recommending the deletion of hundreds of thousands of State Department cables, memoranda, and reports, sight unseen. The costs in terms of democratic accountability were incalculable and included the loss of public confidence in political institutions, the proliferation of conspiracy theories, and the increasing difficulty historians would have in reconstructing what our leaders do under the cloak of secrecy.
We wanted to assemble a database of declassified documents and use algorithms to reveal patterns and anomalies in the way bureaucrats decide what information must be kept secret and what information can be released. To what extent were these decisions balanced and rule-based, as official spokesmen have long claimed? Were they consistent with federal laws and executive orders requiring the preservation of public records, and prompt disclosure when possible? Were the exceptions so numerous as to prove the existence of unwritten rules that really served the interests of a deep state? Or was the whole system so dysfunctional as to be random and inexplicable, as other critics insist?
We were trying to determine whether we could reverse-engineer these processes, and develop technology that could help identify truly sensitive information. If we assembled millions of documents in databases, and harnessed the power of high-performance computing clusters, it might be possible to train algorithms to look for sensitive records requiring the closest scrutiny and accelerate the release of everything else. The promise was to make the crucial but dysfunctional declassification process more equitable and far more efficient. We had begun to call it a declassification engine, and if someone did not start building and testing prototypes, the exponential increase in government secretsmore and more of them consisting of data rather than paper documentsmight make it impossible for public officials to meet their own legal responsibilities to maximize transparency. Even if we failed to get the government to adopt this kind of technology, testing these tools and techniques would reveal gaps and distortions in the public record, whether from official secrecy or archival destruction.
The lawyers in front of me started to discuss the worst-case scenarios, and the officers of the foundation grew visibly uncomfortable. What if my team was able to reveal the identity of covert operatives? What if we uncovered information that would help someone build a nuclear weapon? If the foundation gave us the money, their lawyers warned that the foundation staff might be prosecuted for aiding and abetting a criminal conspiracy. Why, the most senior program officer asked, should they help us build a tool that is purpose-built to break the law?
The only one who did not seem nervous was the former ACLU lawyer whom Columbia had hired to represent us. He had argued cases before the Supreme Court. He had defended people who published schematics of nuclear weaponsand won. He had shown how any successful prosecution required proving that someone had possession of actual classified information. How could the government go after scholars doing research on declassified documents?
The exgovernment lawyers pointed out that we were not just academics making educated guesses about state secretsnot when we were using high-performance computers and sophisticated algorithms. True, no journalist, no historian, can absorb hundreds of thousands of documents, analyze all of the words in them, instantly recall every one, and rank each according to one or multiple criteria. But scientists and engineers can turn millions of documents into billions of data points and use machine learningor teaching a computer to teach itselfto detect patterns and make predictions. We agree with these predictions every time we watch a movie Netflix recommends, or buy a book that Amazon suggests. If we threw enough data at the problem of parsing redacted documentsthe ones in which government officials have covered up the parts they do not want us to seecouldnt these techniques recommend the words most likely to be hiding behind the black boxes, which presumably were hidden for good reason?