First Printing, 2016
The Accidental Taxonomist, Second Edition
Copyright 2016 by Heather Hedden
All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical means, including information storage and retrieval systems, without permission in writing from the publisher, except by a reviewer, who may quote brief passages in a review. Published by Information Today, Inc., 143 Old Marlton Pike, Medford, New Jersey 08055.
Publishers Note: The author and publisher have taken care in preparation of this book but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book and Information Today, Inc., was aware of a trademark claim, the designations have been printed with initial capital letters.
Library of Congress Cataloging-in-Publication Data
Names: Hedden, Heather.
Title: The accidental taxonomist / Heather Hedden.
Description: Second edition. | Medford, New Jersey : Information Today, Inc., [2016] | Includes bibliographical references and index.
Identifiers: LCCN 2016002968 | ISBN 9781573876964
Subjects: LCSH: Information organization. | Classification. | Indexing. | Subject headings. | Cross references (Information retrieval) | Thesauri.
Classification: LCC Z666.5 .H43 2016 | DDC 025dc23
LC record available at http://lccn.loc.gov/2016002968
Printed and bound in the United States of America
President and CEO: Thomas H. Hogan, Sr.
Editor-in-Chief and Publisher: John B. Bryans
Associate Editor: Beverly M. Michaels
Production Manager: Tiffany Chamenko
Marketing Coordinator: Rob Colding
Indexer: Kathleen Rocheleau
Cover Designer: Ashlee Caruolo
Composition by Amnet Systems
www.infotoday.com
Contents
Organizing electronic content using metadata fields with controlled vocabularies has at least a 50-year history. Its the story of how we got from expensive, rarely used time-shared databases to the almost ubiquitous web where anyone can look it up anywhere, anytime. The work of tagging content has always been done by an army of indexers, more geeks than librarians, working in more of a cottage industry than a factory. All were accidental information scientists with backgrounds in business, medicine, law, the humanities, and maybe sometimes library but rarely computer science.
Some people may think that the content in Heather Heddens practical compendium is old wine in a new bottle, but somebody had to write this stuff down. True, librarians have been doing cataloging, classification, and subject indexing for a long time, long before electronic content became a format to manage. But meaningfully adapting appropriate practices from library science and communicating them in a form that can be effectively used by a broad interdisciplinary audience is the major accomplishment of this book.
Taxonomies to support content indexing and finding could be tied to the history of database systems that included processable text information. At first these databases were electronic versions of abstracting and indexing services offered as very expensive, time-share online services (e.g., Dialog), later as subscription CD-ROM databases, and most recently as various types of web-mediated services. In the early days, two disciplines dominated the online servicesmedicine and law. Medical informatics was heavily subsidized by governments (especially in the United States) after World War II, and legal information (e.g., LexisNexis) was valuable enough to be paid for by large corporations who were the clients of large law firms. Medical Subject Headings (MeSH) was introduced by the National Library of Medicine in 1960. Its precursor was the subject headings of Index Medicus, which date from 1940. Medical subjects are one of the taxonomy gold standards. They include taxonomies for the human body, taxonomies for conditions and treatments, taxonomies for medical practice settings, etc.
The iterations of digital environments over the past 50 years have had major impacts on what would be considered effective and efficient information organization strategies. In the era of expensive, time-share online services, taxonomies needed to enable especially precise retrieval because every minute and every citation to an information source had a significant cost associated with it. End users, such as business managers, were typically not allowed to execute their own searches. This was an era of intermediated searching. The online searcher (often a librarian) was a highly trained gatekeeper and often a subject matter expert him- or herself.
With CD-ROMs the costs of online access were eliminated. But the content organization schemes had to be changed to work on these self-contained platforms. The web changed this again, at first replacing content organization with the power of web search engines (Google, Yahoo!, Altavista, etc.); global taxonomies, such as the DMOZ Open Directory Project; and very importantly, online shopping. Search engines transformed us into a look it up culture. Shopping online has taught everyone how to do Boolean searching, these days referred to as search refining.
The current era of the semantic web is proving to be a further watershed, because its underpinnings are the identification of named entitiespeople, organizations, locations, events, products, topics, and the likewhen they occur in the content on the web. The first-generation web enabled the observation and boosting of content relevance based simply on its access and use. The semantic web is enabling the identification of relationships among all types of named entities and the presentation of information based on these relationships. Simply put, the semantic web is based on the organizing power of faceted taxonomy.
Inside the organization, the relatively new current expectation is that information should be as findable and linkable as on the public web. Enterprise applications are more and more becoming web services that happen to be within the organizational firewall. Employees expect there to be
a single place for internal information delivery
a view of information across different business silos
easy access to others across different business groups to foster collaboration
a trusted location for conducting day-to-day activities
As taxonomy becomes a ubiquitous part of the organizational information ecosystem, there is more and more demand from organizations for people who have the skills to integrate taxonomies into enterprise applications. But what exactly does creating and maintaining taxonomies entail, and where are you going to find the appropriate expertise to competently undertake these tasks? While this is a great time to be a taxonomy consultant, one measure of the success of one of our engagements is whether a taxonomy editor has been identified or hired to be the central point of contact for taxonomy maintenance. Hence, you may find yourself becoming an accidental taxonomist.
This book is an excellent primer for the novice who finds him- or herself assigned (or volunteering for) the task of creating and maintaining a taxonomy. The book should also serve as a bible for the expert (I have a copy on my shelf). It answers these key questions I am frequently asked:
Next page