Table of Contents
Guide
List of Illustrations
- 1 Introduction: the Project
- 2 Formalizing the Alphabet
- 3 Defining Vocabulary
- 4 Electronic Dictionaries
- 5 Languages, Grammars, and Machines
- 6 Regular Grammars
- 7 Context-Free Grammars
- 8 Context-Sensitive Grammars
- 9 Unrestricted Grammars
- 10 Text Annotation Structure
- 11 Lexical Analysis
- 12 Syntactic Analysis
- 13 Transformational Analysis
Pages
Acknowledgments
I would like to thank the University of Franche-Comt and my colleagues in the ELLIADD laboratory for believing in the NooJ project and supporting the community of NooJ users unfailingly since its inception.
It would be impossible for me to mention every single one of the colleagues and students who have participated, in one way or another, in the extremely ambitious project described in this book that of formalizing natural languages! The NooJ software has been in use since 2002 by a community of researchers and students; see www.nooj4nlp.net. NooJ was developed in direct cooperation with all its users who devoted their energy to this or that specific problem, or to one language or another. Spelling in Semitic languages, variation in Asian languages, intonation in Armenian, inflection in Hungarian, phrasal verbs in English, derivation in Slavic languages, composition in Greek and in Germanic languages, etc. pose a wide variety of linguistic problems, and without the high standards of these linguists the NooJ project would never have known the success it is experiencing today. Very often, linguistic questions that seemed trivial at the time have had a profound influence on the development of NooJ.
Among its users, there are some NooJ experts to whom I would like to give particular thanks, as they participated directly in its design, and had the patience to help me with long debugging sessions. I thank them for their ambition and their patience: Hla Fehri, Kristina Kocijan, Slim Mesfar, Cristina Mota, and Simonetta Vietri.
I would also like to thank Danielle Leeman and Franois Trouilleux for their detailed review of the original book, and Peter Machonis for his review of the English version, as well as for verifying the relevance of the English examples, which contributed greatly to the quality of this book.
Max SILBERZTEIN
November, 2015.
For Nadia Nooj Malinovich Silberztein, the Mensch of the family, without whom neither this book, nor the project named after her, would have happened.
And for my two children, Avram and Rosa, who remind me every day of the priorities in my life.
Series Editor
Patrick Paroubek
Formalizing Natural Languages
The NooJ Approach
Max Silberztein
First published 2016 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St Georges Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
ISTE Ltd 2016
The rights of Max Silberztein to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2015957115
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-84821-902-1
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wileys ebook EULA.
Bibliography
- [AHO 03] AHO A., LAM M., SETHI R. et al., Compilers: Principles, Techniques, and Tools, 2nd ed., Addison Wesley, 2006.
- [ALL 07] ALLAUZEN C., RILEY M., SCHALKWYK J., Open Fst: a general and efficient weighted finite-state transducer library, Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA), vol. 4783, pp. 1123, 2007.
- [AME 11] American Heritage Dictionary of the English Language, Fifth Edition. Boston: Houghton Mifflin Company, 2011.
- [AOU 07] AOUGHLIS F., A computer science dictionary for NooJ, Lecture Notes in Computer Science, Springer-Verlag, vol. 4592, p. 341351, 2007.
- [BAC 59] BACKUS J., The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference, Proceedings of the International Conference on Information Processing, UNESCO, pp. 125132, 1959.
- [BAL 02] BALDRIDGE J., Lexically Specified Derivational Control in Combinatory Categorial Grammar, PhD Dissertation. Univ. of Edinburgh, 2002.
- [BAR 08] BARREIRO A., Para MT: a paraphraser for machine translation, Lecture Notes in Computer Science, Springer-Verlag, vol. 5190, pp. 202211, 2008.
- [BAR 14] BARREIRO A., BATISTA F., RIBEIRO R. et al., Open Logos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries, Proceedings of the 9th edition of the LREC Conference, 2014.
- [BEN 15] BEN A., FEHRI H., BEN H., Translating Arabic relative clauses into English using NooJ, Formalising Natural Languages with NooJ 2014, Cambridge Scholars Publishing, Newcastle, 2015.
- [BEN 10] BEN H., PITON O., FEHRI H., Recognition and Arabic-French translation of named entities: case of the sport places, Finite-State Language Engineering with NooJ: Selected Papers from the NooJ 2009 International Conference, Sfax University Press, Tunisia, 2010.
- [BER 60] BERNER R., A proposal for character code compatibility, Communications of the ACM, vol. 3, no. 2, pp. 7172, 1960.
- [BIN 90] BINYONG Y., FELLEY M., Chinese Romanization: Pronunciation and Orthography, Sinolingua, Peking, 1990.
- [BLA 90] BLAKE B., Relational Grammar, Routledge, London, 1990.
- [BLO 33] BLOOMFIELD L., Language, Henry Holt, New York, 1933.
- [BG 07] BGEL T., BUTT M., HAUTLI A. et al., Developing a finite-state morphological analyzer for Urdu and Hindi: some issues, Proceedings of FSMNLP07, Potsdam, Germany, 2007.
- [BRI 92] BRILL E., A simple rule-based part of speech tagger, Proceedings of the ANLC92 3rd Conference on Applied Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, 1992.
- [BRU 02] BRUNSTEIN B., Annotation guidelines for answer types, Linguistic Data Consortium, Philadelphia, 2002.
- [DON 13] DONNELLY C., STALLMAN R., GNU Bison-The Yacc-Compatible Parser Generator: Bison Version 2.7, FSF, p. 201, 2013.
- [CHA 97] CHARNIAK E., Statistical techniques for natural language parsing, AI Magazine, vol. 18, no. 4, p. 33, 1997.
- [CHO 57] CHOMSKY N., Syntactic Structures, Mouton: The Hague, 1957.
- [CHR 92] CHRISTIANSEN M., The (non) necessity of recursion in natural language processing,
Next page