As you walk up Walton Street from the centre of Oxford the road bears slightly to the left and a large 19th century building comes into view. It is not an Oxford college but the headquarters of the Oxford University Press. OUP is the largest university press in the world, and can date its origins back to around 1480. In 1983 I arrived at this building carrying a Texas Silent 700 terminal. This used thermal ink printer technology and had two rubber ears on the top into which a telephone handset could be inserted to link the printer into the BT public telephone network through an acoustic coupler. A decade earlier I had used the same technology to use the first computer-based search services developed by the Lockheed Corporation and System Development Corporation.
I was heading up early attempts by Reed Publishing to develop electronically published products and services, notably airline flight timetables. Reed owned International Computaprint Corporation, based in Fort Washington, PA, which specialized in keyboarding and printing telephone directories and airline timetables. Reed had been working with IBM and the University of Waterloo, Canada on the New Oxford English Dictionary (NOED) project, which was to create a digital version of the Oxford English Dictionary. The OED seeks not only to provide a definitive definition of a word, but also the origins of when the word was first used, with examples of subsequent use which may have modified the definition. All these examples were contained on around 4 million slips of paper.
The proof of concept was to digitize the one of the Supplements to the First Edition, starting at the letter S. The digitization and indexing had now been completed and I, together with Hans Nickel, the founder and CEO of ICC, were about to demonstrate what we had achieved to the NOED project team led by Tim Benbow and Edmund Weiner. Many of the lexicographers were skeptical of the value of the project, and there was a mixture of expectation and disinterest around the table.
With the terminal we set up a connection (at 300 baud!) to the computer in Fort Washington. I can still remember the first question, which came from one of the more skeptical lexicographers, who wanted to know how many words in the OED originated in the Times newspaper. Because all the text had been marked up in Standard Generalized MarkUp (SGML) language (a forerunner of XML) we could identify the source, and not only provide a count but print out (albeit very slowly) all the examples. There was a short period of silence and then these distinguished scholars suddenly realized the potential of information retrieval. They also recognised that it was not going to put them out of a job but enable them to improve the value of the product. Many more queries were undertaken and the session only came to an end when we ran out of supplies of thermal paper.
The NOED project was a great success, not only for the OUP but also for Dr Gaston Gonnet and his team at University of Waterloo. This team became the nucleus of Open Text Corporation. IBM used the knowledge gained from the project in the development of its search technology as the OED files provided a rich source of syntax information to help with query development.
For me it was a day of discovery about the power of search to discover new relationships between items of information. I learned three important lessons from this project. The first of these was the value of metadata structure in searching. Because of the way that the individual elements of the entries had been marked up in SGML it was easy to search for words that had first been used by Charles Dickens after his return from his first visit to the United States in 1842. The second lesson was gained in listening to the members of the project team from IBM and the University of Waterloo as they talked about the importance of computers being able to understand the structure of sentences, work that would lead to the development of semantic search technologies. The third lesson was in understanding the impact that search could have on organizational processes and outputs.
Almost three decades on from that visit to Oxford I am still fascinated and frustrated by the technology of search and the process of searching. In many respects we have not come all that far from the technology I was using in 1974. Googles PageRank is not far removed from Dr. Gene Garfields development of citation indexes in 1960 and the concepts of recall and precision emerged from research carried out by Cyril Cleverdon at the Cranfield Institute of Technology, UK, in the mid-1960s. The mathematics of vector-space indexing was developed by Dr. Gerald Salton at Cornell University.in 1975 and Dr. Michael Lynch founded Autonomy Ltd. in 1996.
Enterprise search is now moving from a nice to have to a need to have application as organizations struggle to find the information they need to make good business decisions. Not only is more information being created but nothing is being thrown away. Search technology is a mixture of the mathematical management of probability and computational linguistics but this book is not about technology. It is about meeting the expectations of users by investing the skills and experience needed to manage the technology. Whether you are a business manager, IT manager or information professional I hope that when you finish this book you will set up a meeting with your HR Manager and start the process of staffing up your search support team before any further investment in technology.
As you read this book I hope you find what you are looking for
How to Use This Book
This book has been written to help business managers, and the IT teams supporting them, understand why effective enterprise-wide search is essential in any organization, and how to go about the process of meeting user requirements. This could be by improving the existing search application(s) or by specifying and implementing a new search application. Search technology is not easy to understand without a good background in applied mathematics or information science. This book has just two chapters out of twelve on search technology, with the objective of providing just enough detail to understand the possibilities offered by enterprise search and the software available on commercial and open-source terms.
A good place to start might be on critical successfactors. If you are not able to meet at least eight of the twelve success factors then you really doneed to read this book.
considers the elements of an enterprise search strategy,highlighting the importance of allocating an adequate level of staffing to the support of search. Anorganization with more than 1000 employees probably needs a search support team of two people, andabove around 10,000 employees this will double.
.
If the result of the user research and business planning is that a new search application is required then cover the process of defining the business and search requirements, the evaluation of commercial and open-source software and the management of the installation and implementation.
If you only have time to read one chapter please read gives an overview of some of the current directions in search development.