1.1 Intended Scope and Audience
The information technology revolution was made possible by the invention of electronic digital machines, but without programming languages their use would have been restricted to the few people able to write binary machine code. A programming language as a text contains features coming both from human languages and from mathematical logic. The translation from a programming language to machine code is known as compilation . Language compilation is a very complex process, which would be impossible to master without systematic design methods. Such methods and their theoretical foundations are the argument of this book. They make up a consistent and largely consolidated body of concepts and algorithms, which are applied not just in compilers, but also in other fields. Automata theory is pervasively used in all branches of informatics to model situations or phenomena classifiable as time and space discrete systems. Formal grammars on the other hand originated in linguistic research and are widely applied in document processing, in particular for the Web.
Coming to the prerequisites, the reader should have a good background in programming, but the detailed knowledge of a specific programming language is not required, because our presentation of algorithms uses self-explanatory pseudo-code. The reader is expected to be familiar with basic mathematical theories and notation, namely set theory, algebra, and logic. The above prerequisites are typically met by computer science/engineering or mathematics students with two or more years of university education.
The selection of topics and the presentation based on rigorous definitions and algorithms illustrated by many motivating examples should qualify the book for a university course, aiming to expose students to the importance of good theories and of efficient algorithms for designing effective systems. In our experience about 50 hours of lecturing suffice to cover the entire book.
The authors long experience in teaching the subject to different audiences brings out the importance of combining theoretical concepts and examples. Moreover it is advisable that the students take advantage of well-known and documented software tools (such as classical Flex and Bison ), to implement and experiment the main algorithm on realistic case studies.
With regard to the reach and limits, the book covers the essential concepts and methods needed to design simple translators based on the syntax-directed paradigm. It goes without saying that a real compiler for a programming language includes other technological aspects and know-how, in particular related to processor and computer architecture, which are not covered. Such know-how is essential for automatically translating a program to machine instructions and for transforming a program in order to make the best use of the computational resources of a computer. The study of program transformation and optimization methods is a more advanced topic, which follows the present introduction to compiler methods. The next section outlines the contents of the book.
1.2 Compiler Parts and Corresponding Concepts
There are two external interfaces to a compiler: the source language to be analyzed and translated, and the target language produced by the translator.
Chapter describes the so-called syntactic methods that are generally adopted in order to provide a rigorous definition of the texts (or character strings) written in the source language. The methods to be presented are regular expressions and context-free grammars. Both belong to formal language theory, a well-established chapter of theoretical computer science.
The first task of a compiler is to check the correctness of the source text, that is, whether it complies with the syntactic definition of the source language by certain grammar rules. In order to perform the check, the algorithm scans the source text character by character and at the end it rejects or accepts the input depending on the result of the analysis. By a minimalist approach, such recognition algorithms can be conveniently described as mathematical machines or automata, in the tradition of the well-known Turing machine.
Chapter covers finite automata, which are machines with a finite random access memory. They are the recognizers of the languages defined by regular expressions. Within compilation they are used for lexical analysis or scanning, to extract from the source text keywords, numbers, and in general the pieces of text corresponding to the lexical units or lexemes (or tokens) of the language.
Chapter is devoted to the recognition problem for languages defined by context-free grammars. Recognition algorithms are first modeled as finite automata equipped with unbounded last-in-first-out memory or pushdown stack. For a compiler, the language recognizer is an essential component known as the syntax analyzer or parser. Its job is to check the syntactic correctness of a source text already subdivided into lexemes, and to construct a structural representation called a syntax tree. To reduce the parsing time for very long texts, the parser can be organized as a parallel program.
The ultimate job of a compiler is to translate a source text to another language. The module responsible for completing the verification of the source language rules and for producing the translation is called the semantic analyzer. It operates on the structural representation produced by the parser.
The formal models of translation and the methods used to implement semantic analyzers are in Chap. , which describes two kinds of transformations. Pure syntactic translations are modeled by finite or pushdown transducers. Semantic translations are performed by functions or methods that operate on the syntax tree of the source text. Such translations will be specified by a practical extension to context-free grammars, called attribute grammars. This approach, by combining the accuracy of formal syntax and the flexibility of programming, conveniently expresses the analysis and translation of syntax trees.
To give a concrete idea of compilation, typical simple examples are included: the type consistency check between variables declared and used in a programming language, the translation of high-level statements to machine instructions, and semantic-directed parsing.
For sure compilers do much more than syntax-directed translation. Static program analysis is an important example, consisting in examining a program to determine, ahead of execution, some properties, or to detect errors not covered by semantic analysis. The purpose is to improve the robustness, reliability, and efficiency of the program. An example of error detection is the identification of uninitialized variables. For code improvement, an example is the elimination of useless assignment statements.