1.1 Natural Language Processing: The Basics
Natural language processing (NLP) investigates the use of computers to process or to understand human (i.e., natural) languages for the purpose of performing useful tasks. NLP is an interdisciplinary field that combines computational linguistics, computing science, cognitive science, and artificial intelligence. From a scientific perspective, NLP aims to model the cognitive mechanisms underlying the understanding and production of human languages. From an engineering perspective, NLP is concerned with how to develop novel practical applications to facilitate the interactions between computers and human languages. Typical applications in NLP include speech recognition, spoken language understanding, dialogue systems, lexical analysis, parsing, machine translation, knowledge graph, information retrieval, question answering, sentiment analysis, social computing, natural language generation, and natural language summarization. These NLP application areas form the core content of this book.
Natural language is a system constructed specifically to convey meaning or semantics, and is by its fundamental nature a symbolic or discrete system. The surface or observable physical signal of natural language is called text, always in a symbolic form. The text signal has its counterpartthe speech signal; the latter can be regarded as the continuous correspondence of symbolic text, both entailing the same latent linguistic hierarchy of natural language. From NLP and signal processing perspectives, speech can be treated as noisy versions of text, imposing additional difficulties in its need of de-noising when performing the task of understanding the common underlying semantics. Chapters of this book cover the speech aspect of NLP in detail, while the remaining chapters start directly from text in discussing a wide variety of text-oriented tasks that exemplify the pervasive NLP applications enabled by machine learning techniques, notably deep learning.
The symbolic nature of natural language is in stark contrast to the continuous nature of languages neural substrate in the human brain. We will defer this discussion to Sect. of this chapter when discussing future challenges of deep learning in NLP. A related contrast is how the symbols of natural language are encoded in several continuous-valued modalities, such as gesture (as in sign language), handwriting (as an image), and, of course, speech. On the one hand, the word as a symbol is used as a signifier to refer to a concept or a thing in real world as a signified object, necessarily a categorical entity. On the other hand, the continuous modalities that encode symbols of words constitute the external signals sensed by the human perceptual system and transmitted to the brain, which in turn operates in a continuous fashion. While of great theoretical interest, the subject of contrasting the symbolic nature of language versus its continuous rendering and encoding goes beyond the scope of this book.
In the next few sections, we outline and discuss, from a historical perspective, the development of general methodology used to study NLP as a rich interdisciplinary field. Much like several closely related sub- and super-fields such as conversational systems, speech recognition, and artificial intelligence, the development of NLP can be described in terms of three major waves (Deng ), each of which is elaborated in a separate section next.
1.2 The First Wave: Rationalism
NLP research in its first wave lasted for a long time, dating back to 1950s. In 1950, Alan Turing proposed the Turing test to evaluate a computers ability to exhibit intelligent behavior indistinguishable from that of a human (Turing ). This test is based on natural language conversations between a human and a computer designed to generate human-like responses. In 1954, the Georgetown-IBM experiment demonstrated the first machine translation system capable of translating more than 60 Russian sentences into English.
The approaches, based on the belief that knowledge of language in the human mind is fixed in advance by generic inheritance, dominated most of NLP research between about 1960 and late 1980s. These approaches have been called rationalist ones (Church ). Postulating that key parts of language are hardwired in the brain at birth as a part of the human genetic inheritance, rationalist approaches endeavored to design hand-crafted rules to incorporate knowledge and reasoning mechanisms into intelligent NLP systems. Up until 1980s, most notably successful NLP systems, such as ELIZA for simulating a Rogerian psychotherapist and MARGIE for structuring real-world information into concept ontologies, were based on complex sets of handwritten rules.
This period coincided approximately with the early development of artificial intelligence, characterized by expert knowledge engineering, where domain experts devised computer programs according to the knowledge about the (very narrow) application domains they have (Nilsson ). The main strength of these first-generation artificial intelligence systems is its transparency and interpretability in their (limited) capability in performing logical reasoning. Like NLP systems such as ELIZA and MARGIE, the general expert systems in the early days used hand-crafted expert knowledge which was often effective in narrowly defined problems, although the reasoning could not handle uncertainty that is ubiquitous in practical applications.
In specific NLP application areas of dialogue systems and spoken language understanding, to be described in more detail in Chaps. ). The designs were centered on grammatical and ontological constructs, which, while interpretable and easy to debug and update, had experienced severe difficulties in practical deployment. When such systems worked, they often worked beautifully; but unfortunately this happened just not very often and the domains were necessarily limited.
Likewise, speech recognition research and system design, another long-standing NLP and artificial intelligence challenge, during this rationalist era were based heavily on the paradigm of expert knowledge engineering, as elegantly analyzed in (Church and Mercer ). However, the lack of abilities to learn from data and to handle uncertainty in reasoning was acutely recognized by researchers, leading to the second wave of speech recognition, NLP, and artificial intelligence described next.