This edition first published 2013
2013 Markus Dickinson, Chris Brew, Detmar Meurers
Blackwell Publishing was acquired by John Wiley & Sons, in February 2007. Blackwells publishing program has been merged with Wileys global Scientific, Technical, and Medical business to form Wiley-Blackwell.
Registered Office
John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Offices
350 Main Street, Malden, MA 02148-5020, USA
9600 Garsington Road, Oxford, OX4 2DQ, UK
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, for customer services, and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell .
The right of Markus Dickinson, Chris Brew, Detmar Meurers to be identified as the authors of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Dickinson, Markus.
Language and computers / Markus Dickinson, Chris Brew, Detmar Meurers.
p. cm.
Includes index.
ISBN 978-1-4051-8306-2 (cloth) ISBN 978-1-4051-8305-5 (pbk.) 1. Computational linguistics.
2. Natural language processing (Computer science) I. Brew, Chris. II. Meurers, Detmar. III. Language and computers
P98.D495 2013
410.285dc23
2012010324
A catalogue record for this book is available from the British Library.
Cover design by www.cyandesign.co.uk
What This Book Is About
The computer has become the medium of choice through which much of our language use is channeled. Modern computer systems therefore spend a good part of their time working on human language. This is a positive development: not only does it give everyone on the internet access to a world of information well beyond the scope of even the best research libraries of the 1960s and 1970s, it also creates new capabilities for creation, exploitation, and management of information. These include tools that support nonfiction, creative writing, blogs and diaries, citizen journalism and social interactions, web search and online booking systems, smart library catalogs, knowledge discovery, spoken language dialogs, and foreign language learning.
This book takes you on a tour of different real-world tasks and applications where computers deal with language. During this tour, you will encounter essential concepts relating to language, representation, and processing, so that by the end of the book you will have a good grasp of key concepts in the field of computational linguistics. The only background you need to read this book is some curiosity about language and some everyday experience with computers.
This is indeed why the book is organized around real-world tasks and applications. We assume that most of you will be familiar with many of the applications and may wonder how they work or why they dont work. What you may not realize is how similar the underlying processing is. For example, there is a great deal in common between how grammar checkers and automatic speech-recognition systems work. We hope that demonstrating how these concepts recur in this case, in something called n-grams will reinforce the importance of applying general techniques to new applications.
The book is designed to make you aware of how technology works and how language works. We focus on a few applications of language technology (LT), computational linguistics (CL), and natural language processing (NLP). LT, CL, and NLP are essentially names for the same thing, seen from the perspectives of industry, linguistics, and computer science, respectively. The tasks and applications were chosen because: (i) they are representative of techniques used throughout the field; (ii) they represent a significant body of work in and of themselves; (iii) they connect directly to linguistic modeling; and (iv) they are the ones the authors know best. We hope that you will be able to use these examples as an introduction to general concepts that you can apply to learning about other applications and areas of inquiry.
How to use the book
There are a number of features in this textbook that allow you to structure what you learn, explore more about the topics, and reinforce what you are learning. As a start, the relevant concepts being covered are typeset in bold and shown in the margins of each page. You can also look those up in the Concept Index at the end of the book.
The Under the Hood sections included in many of the chapters are intended to give you more detail on selected advanced topics. For those interested in learning more about language and computers, we hope that you find these sections enjoyable and enlightening, though the gist of each chapter can be understood without reading them.
At the end of each chapter there is a Checklist indicating what you should have learned. The Exercises also found at the end of each chapter review the material and give you opportunities to go beyond it. Our hope is that the checklist and exercises help you to get a good grasp of each of the topics and concepts involved. We recognize, however, that students from different backgrounds have different skills, so we have marked each question with an indication of who the question is for. There are four designations: most questions are appropriate for all students and thus are marked with ALL; LING questions assume some background and interest in linguistics; CS questions are appropriate for those with a background in computer science; and MATH is appropriate for those wanting to tackle more mathematical challenges. Of course, you should not feel limited by these markers, as a strong enough desire will generally allow you to tackle most questions.
If you enjoy the topic of a particular chapter, we encourage you to make use of the Further reading recommendations. You can also follow the page numbers under each entry in the References at the end of the book to the place where it is discussed in the book.
Finally, on the books companion website http://purl.org/lang-and-comp we have collected resources and links to other materials that could be of interest to you when exploring topics around language and computers.
Next page