Robert Arp, Barry Smith, and Andrew D.Spear
To Susan, Sandra, and Maria Teresa
In recent decades we have seen the gradual expansion of the use of computers and computing technology in all areas of human life. In the sciences the promise of computers to store, manage, and integrate tremendous amounts of data and information has given rise to new disciplines focused on data and information, and to new interdisciplinary fields such as biomedical informatics, materials informatics, geospatial informatics, and many more.
One increasingly dominant strategy for the organization of scientific information about the world in computer-friendly form is associated with the term "ontology" (or sometimes "ontological engineering" or "ontology technology" or "applied ontology"), understood as meaning (roughly) a controlled vocabulary for representing the types of entities in a given domain.
This strategy has been most conspicuously successful in the field of biology and biomedicine, where its proponents have come to view the task of organizing scientific information as requiring an unusually broad collaboration involving not only computer and information scientists, biologists, and clinicians, but also linguists, logicians, and, occasionally, philosophers interested in the study of the basic categories of reality. This book is an introduction to the field of applied ontology thus conceived. It explains the needs which ontologies have been designed to meet, explains what an ontology itself is, and outlines in detail principles of best practice for approaching the task of ontology design. The book also outlines a specific formal or top-level ontology, the Basic Formal Ontology, and provides illustrations of its use.
All three coauthors of this work were trained as philosophers, though all have become involved, in different ways, in applied ontology projects in biomedicine and related fields. All share the belief that philosophical ideas and theories can play an important role in advancing the quality of work in ontological applications, and the chapters that follow are very much a product of this belief. We have used philosophical ideas throughout-though our philosophical colleagues will say that we have sometimes done so incautiously, and certainly with what is for their purposes insufficient detail. What follows is not, however, intended as a contribution to philosophy. It is intended, rather, to form part of what we conceive as the rich, new technical discipline of ontology.
This book has been a long time in the making and has benefited from the collaboration and critical comments of numerous individuals. Andrew Spear wrote a first rough draft of the manuscript under the direction of Barry Smith at the Institute for Formal Ontology and Biomedical Information Science (IFOMIS) in Saarbrticken, Germany, in 2006. In 2007 Robert Arp became involved in the project under the auspices of the National Center for Biomedical Ontology, leading to substantial revision and expansion of the manuscript in a collaborative effort of over eight years. The result is the current book, which bears equally the stamp of each of us, and of our limitations.
After so many years of discussion and input, it would be impossible to recognize everyone who has made some contribution to this manuscript here and to the development of Basic Formal Ontology (BFO) on which it is based. However, we would like to acknowledge the comments and critical advice of Mauricio Almeida, Jonathan Bona, Mathias Brochhausen, Roberto Casati, Werner Ceusters, Melanie Courtot, Lindsay Cowell, Randall Dipert, William Duncan, Bastian Fischer, Albert Goldfain, Pierre Grenon, Janna Hastings, Boris Hennig, William Hogan, Leonard Jacuzzo, Ingvar Johansson, Waclaw Kusnierczyk, Kristl Laux, Richard Lee, Tatiana Malyuta, William Mandrick, Kevin Mulligan, Chris Mungall, Darren Natale, Fabian Neuhaus, Snezana Nikolic, Chris Partridge, Bjoern Peters, Anthony Petosa, Mark Ressler, Robert Rovetto, Ronald Rudnicki, Alan Ruttenberg, Emilio Sanfilippo, Richard Scheuermann, James Schoening, Yonatan Schreiber, Stefan Schulz, Ulf Schwarz, Selja Seppala, Shane Sicienski, Peter Simons, Holger Stenzhorn, Kerry Trentelman, Achille Varzi, and Jie Zheng. Naturally, they bear no responsibility for the many shortcomings that remain.
Chapters 5, 6, and parts of 7 are based on the draft specification of BFO 2.0, which contains also formal definitions of the terms introduced in these chapters as well as associated axioms and theorems and considerable further explanatory material. Many of the persons mentioned above provided invaluable assistance in creating this specification, but we wish to mention especially Werner Ceusters and Alan Ruttenberg.
We are also particularly grateful to Ingvar Johansson, Waclaw Kusnierczyk, and Snezana Nikolic, who read and commented on early drafts of the whole document, and to Mark Musen, Director of the National Center for Biomedical Ontology, which supported our work on the preparation of these early drafts. We are grateful also to the National Institute for Human Genome Research, the Alexander von Humboldt Foundation, the Volkswagen Foundation, and the European Union, which provided funding for this work. None of these organizations is responsible in any way for the content of what follows.
Overwhelmed with Information
Today more than ever before in history, we live in an age of information-driven science. In all areas of the life sciences, in particular, well-organized and well-funded research groups are carrying out sustained and systematic research into areas of fundamental biological concern, yielding ever-larger quantities of information that is accessible only with the aid of computers. Vast amounts of information are being produced daily as a result of new types of high-throughput technology in areas such as next generation sequencing, molecular screening, and 2-, 3-, and 4-D imaging at multiple scales from molecules through cells and cell populations up to whole brains. At the same time the contents of scientific journals are increasingly being made available in forms that make them accessible to automated search and processing.
Already, the sheer quantity of available scientific information is becoming overwhelming, and this new information can be used effectively only if there is some strategy for ensuring its progressive integration with the information already existing, and for making it readily available in formats understandable to both computers and to human beings. The progress of science requires that the results being achieved in Pittsburgh or Berkeley should be able to build on the results already achieved in Peking or Bangalore. For these reasons scientific information needs to be stored, standardized, processed, and made available in a way that overcomes the idiosyncrasies of particular research groups and technologies.