The MIT Press Essential Knowledge Series
Auctions, Timothy P. Hubbard and Harry J. Paarsch
Cloud Computing, Nayan Ruparelia
Computing: A Concise History, Paul E. Ceruzzi
The Conscious Mind, Zoltan L. Torey
Crowdsourcing, Daren C. Brabham
Free Will, Mark Balaguer
Information and Society, Michael Buckland
Information and the Modern Corporation, James W. Cortada
Intellectual Property Strategy, John Palfrey
The Internet of Things, Samuel Greengard
Machine Learning: The New AI, Ethem Alpaydin
Machine Translation, Thierry Poibeau
Memes in Digital Culture, Limor Shifman
Metadata, Jeffrey Pomerantz
The MindBody Problem, Jonathan Westphal
MOOCs, Jonathan Haber
Neuroplasticity, Moheb Costandi
Open Access, Peter Suber
Paradox, Margaret Cuonzo
Robots, John Jordan
Self-Tracking, Gina Neff and Dawn Nafus
Sustainability, Kent E. Portney
The Technological Singularity, Murray Shanahan
Understanding Beliefs, Nils J. Nilsson
Waves, Frederic Raichlen
Machine Translation
Thierry Poibeau
The MIT Press
Cambridge, Massachusetts
London, England
2017 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
This book was set in Chaparral Pro by Toppan Best-set Premedia Limited. Printed and bound in the United States of America.
Library of Congress Cataloging-in-Publication Data is available.
ISBN: 978-0-262-53421-5
eISBN 9780262342438
ePub Version 1.0
Series Foreword
The MIT Press Essential Knowledge series offers accessible, concise, beautifully produced pocket-size books on topics of current interest. Written by leading thinkers, the books in this series deliver expert overviews of subjects that range from the cultural and the historical to the scientific and the technical.
In todays era of instant information gratification, we have ready access to opinions, rationalizations, and superficial descriptions. Much harder to come by is the foundational knowledge that informs a principled understanding of the world. Essential Knowledge books fill that need. Synthesizing specialized subject matter for nonspecialists and engaging critical topics through fundamentals, each of these compact volumes offers readers a point of access to complex ideas.
Bruce Tidor
Professor of Biological Engineering and Computer Science
Massachusetts Institute of Technology
Acknowledgments
This book would not have been possible without the support of colleagues and friends. I want to thank Michelle Bruni, Elizabeth Rowley-Jolivet, Pablo Ruiz Fabo, and Bernard Victorri for their help during the preparation of this book. My gratitude also goes to the editorial and production staff at MIT Press, particularly Marie Lufkin Lee and Katherine A. Almeida. Finally, I want to thank the anonymous reviewers for their careful reading and their many insightful comments and suggestions.
Thierry Poibeau is a member of LATTICE, a research laboratory supported by CNRS, Ecole normale suprieure (ENS), PSL Research University, Universit Sorbonne nouvelle, and USPC.
1 Introduction
In Douglas Adams humorous saga The Hitchhikers Guide to the Galaxy, and more generally to the key problem of language diversity and comprehension. The name of the fish is a transparent allusion to the Biblical episode of Babel, when God scrambled language so that humans could no longer understand one another.
A significant number of thinkers, philosophers, and linguistsand, more recently, computer scientists, mathematicians, and engineershave tackled the question of language diversity. Moreover, they have imagined theories and devices intended to solve the problems caused by this diversity. Since the advent of computers (after the Second World War), this research program has materialized through the design of machine translation toolsin other words, computer programs capable of automatically producing in a target language the translation of a text in a source language.
This research program is very ambitious: it is even one of the most fundamental in the field of artificial intelligence. The analysis of languages cannot be separated from the analysis of knowledge and reasoning, which explains the interest in this field shown by philosophers and specialists of artificial intelligence as well as the cognitive sciences. This brings to mind the test proposed by Turing in 1950: the test is successfully completed if a person dialoguing (through a screen) with a computer is unable to say whether her discussion partner is a computer or a human being. This test is foundational, because developing an operational conversational agent presupposes not only understanding what the discussion partner says (at least to some extent), but also inferring from what has been said a relevant utterance that helps the whole conversation move forward. For Turing, if the test is successful, it means that the machine has a certain degree of intelligence. This question has fueled considerable debate, but we can at least agree on the fact that a robust conversational system would involve formalizing some mechanisms of understanding and reasoning.
The analysis of languages cannot be separated from the analysis of knowledge and reasoning, which explains the interest shown by philosophers and specialists of artificial intelligence as well as cognitive sciences in [machine translation].
Machine translation involves different processes that make it at least as challenging as developing an automatic dialoguing system. The degree of understanding shown by the machine can be very partial: for example, the Eliza system developed by Weizenbaum in 1966 was able to simulate a dialogue between a psychotherapist and his patient. The system in fact just derived questions from the patients utterances (for example, the system was able to produce the question why are you afraid of X? from the sentence I am afraid of X). The system also included a series of ready-made sentences that were used when no predefined patterns seemed to be applicable (for example could you specify what you have in mind? or really?). Despite its simplicity, Eliza had great success, and some patients really thought they were conversing with a real doctor through a computer.
The situation is completely different when considering machine translation. Translation requires in-depth understanding of the text to be translated. Moreover, transposition into another language is a delicate and difficult process, even with news or technical texts. The aim of machine translation is not, of course, to address literature or poetry; rather, the idea is to give the most accurate translation of everyday texts. Even so, the task is immensely difficult, and current systems are still far from satisfactory.
However, and despite its limitations, from a more theoretical point of view, machine translation also makes us take a fresh look at old and widely investigated questions: What does it mean to translate? What kind of knowledge is involved in the translation process? How can we transpose a text from one language to another? These are some of the questions that are addressed in this book.
Next page