Humanities computing is redefining basic principles about research and publication. An influx of new, vibrant, and diverse communities of practitioners recognizes that computer applications are subject to continual innovation and reappraisal. This series publishes books that demonstrate the new questions, methods, and results arising in the digital humanities.
Additional material referred to in this book, including confusion matrices, an expanded stop-words list, and additional graphs and color images, can be found at http://www.matthewjockers.net/macroanalysisbook/.
ACKNOWLEDGMENTS
Without initial support from John Bender, Lois Brooks, Mike Keller, Makoto Tsuchitani, and Ramn Saldvar, this work would not have been started. Without ongoing support from the Stanford University Department of English and the Stanford University Library, this work would not have continued. For years Glen Worthey has been my text pusher, my sounding board, my trusted digital humanities colleague, and my friend. His contributions to the completion of this project are innumerable. From Franco Moretti I have learned and been given much. Our partnership over the past seven years has been the most rewarding collaboration of my career. A special thank-you goes to Susan Schreibman, who convinced me to put these ideas into a book and then read drafts and offered kind and honest feedback. I thank Alan Liu for being such a generous and thoughtful reader of an early draft of the first few chapters; we should all be such gracious scholars. Stfan Sinclair provided a careful and expert review of the entire manuscript, and I could not have asked for a better reader of the work. For , Elijah Meeks was similarly tolerant and gave his time, energy, and expertise to help me understand and leverage the power of Gephi. For patient advice with R in particular and statistics more generally, I acknowledge indirect contributions from Vijoy Abraham, Claudia Engel, Ken Romeo, and Daniela Witten. For their direct and essential contributions to the final manuscript, I thank my copy editor, Annette Wenda, along with the top-notch University of Illinois Press team, especially Bill Regier and Tad Ringo, who fostered this project through to completion. A special and personal thank-you to my wife, Angela, who has been a patient reader, thoughtful editor, and tolerant friend for more than twenty years.
Importantly, enthusiastically, and with deep appreciation, I acknowledge my debt to the students who enrolled in my first course in macroanalysis at Stanford. I doubt that a future group of students could ever capture the enthusiasm and excitement of that year. Their energy and earnestness were the catalyst behind the founding of the Stanford Literary Lab, and they have been in my mind most often in the writing of this book: Richard Alvarez, Cameron Blevins, Ryan Heuser, Nadeen Kharputly, Rachel Kraus, Alison Law, Long Le-Khac, Rhiannon Lewis, Madeline Paymer, Moritz Sudhof, Amir Tevel, Ellen Truxaw, Kathryn VanArendonk, and Connie Zhu, you are all, quite simply, the best a teacher could hope for.
PART I FOUNDATION
REVOLUTION
The digital revolution is far more significant than the invention of writing or even of printing.
Douglas Carl Engelbart
An article in the June 23, 2008, issue of Wired declared in its headline Data Deluge Makes the Scientific Method Obsolete (Anderson 2008). By 2008 computers, with their capacity for number crunching and processing large-scale data sets, had revolutionized the way that scientific research gets done, so much so that the same article declared an end to theorizing in science. With so much data, we could just run the numbers and reach a conclusion. Now slowly and surely, the same elements that have had such an impact on the sciences are revolutionizing the way that research in the humanities gets done. This emerging field we have come to call digital humanitieswhich was for a good many decades not emerging at all but known as humanities computinghas a rich history dating back at least to Father Roberto Busa's concordance work in the 1940s, if not before. Technology has certainly changed some things about the way literary scholars go about their work, but until recently change has been mostly at the level of simple, even anecdotal, search. The humanities computing/digital humanities revolution has now begun, and big data have been a major catalyst. The questions we may now ask were previously inconceivable, and to answer these questions requires a new methodology, a new way of thinking about our object of study.
For whatever reasons, be they practical or theoretical, humanists have tended to resist or avoid computational approaches to the study of literature. And who could blame them? Until recently, the amount of knowledge that might be gained from a computer-based analysis of a text was generally overwhelmed by the dizzying amount of work involved in preparing (digitizing) and then processing that digital text. Even as digital texts became more readily available, the computational methods for analyzing them remained quite primitive. Word-frequency lists, concordances, and keyword-in-context (KWIC) lists are useful for certain types of analysis, but these staples of the digital humanist's diet hardly satiate the appetite for more. These tools only scratch the surface in terms of the infinite ways we might read, access, and make meaning of text. Revolutions take time; this one is only just beginning, and it is the existence of digital libraries, of large electronic text collections, that is fomenting the revolution. This was a moment that Rosanne Potter predicted back in the digital dark ages of 1988. In an article titled Literary Criticism and Literary Computing, Potter wrote that until everything has been encoded, or until encoding is a trivial part of the work, the everyday critic will probably not consider computer treatments of texts (93). Though not everything has been digitized, we have reached a tipping point, an event horizon where enough text and literature have been encoded to both allow and, indeed, force us to ask an entirely new set of questions about literature and the literary record.
Roberto Busa, a Jesuit priest and scholar, is considered by many to be the founding father of humanities computing. He is the author of the Index Thomisticus, a lemmatized index of the works of Thomas Aquinas.
.
I suspect that at least a few humanists have been turned off by one or more of the very public failures of computing in the humanities: for example, the Donald Foster Shakespeare kerfuffle.
EVIDENCE
Scientists scoff at each other's theories but agree in basing them on the assumption that evidence, properly observed and measured, is true.