A Note Regarding Supplemental Files
Supplemental files and examples for this book can be found at http://examples.oreilly.com/9781565922259/. Please use a standard desktop web browser to access these files, as they may not be accessible from all ereader devices.
All code files or examples referenced in the book will be available online. For physical books that ship with an accompanying disc, whenever possible, weve posted all CD/DVD content. Note that while we provide as much of the media content as we are able via free download, we are sometimes limited by licensing restrictions. Please direct any questions or concerns to .
Dedication
To Miriam, for your love and patience.
Arnold Robbins
Preface
This book is about a set of oddly named UNIX utilities, sed and awk . These utilities have many things in common, including the use of regular expressions for pattern matching. Since pattern matching is such an important part of their use, this book explains UNIX regular expression syntax very thoroughly. Because there is a natural progression in learning from grep to sed to awk, we will be covering all three programs, although the focus is on sed and awk.
Sed and awk are tools used by users, programmers, and system administratorsanyone working with text files. Sed, so called because it is a stream editor, is perfect for applying a series of edits to a number of files. Awk, named after its developers Aho, Weinberger, and Kernighan, is a programming language that permits easy manipulation of structured data and the generation of formatted reports. This book emphasizes the POSIX definition of awk. In addition, the book briefly describes the original version of awk, before discussing three freely available versions of awk and two commercial ones, all of which implement POSIX awk.
The focus of this book is on writing scripts for sed and awk that quickly solve an assortment of problems for the user. Many of these scripts could be called "quick-fixes." In addition, we'll cover scripts that solve larger problems that require more careful design and development.
Scope of This Handbook
, is an overview of the features and capabilities of sed and awk.
, demonstrates the basic operations of sed and awk, showing a progression in functionality from sed to awk. Both share a similar command-line syntax, accepting user instructions in the form of a script.
, describes UNIX regular expression syntax in full detail. New users are often intimidated by these strange expressions, used for pattern matching. It is important to master regular expression syntax to get the most from sed and awk. The pattern-matching examples in this chapter largely rely on grep and egrep .
, begins a three-chapter section on sed. This chapter covers the basic elements of writing a sed script using only a few sed commands. It also presents a shell script that simplifies invoking sed scripts.
, divide the sed command set into basic and advanced commands. The basic commands are commands that parallel manual editing actions, while the advanced commands introduce simple programming capabilities. Among the advanced commands are those that manipulate the hold space, a set-aside temporary buffer.
, begins a five-chapter section on awk. This chapter presents the primary features of this scripting language. A number of scripts are explained, including one that modifies the output of the ls command.
, describes how to use common programming constructs such as conditionals, loops, and arrays.
, describes how to use awk's built-in functions as well as how to write user-defined functions.
, covers a set of miscellaneous awk topics. It describes how to execute UNIX commands from an awk script and how to direct output to files and pipes. It then offers some (meager) advice on debugging awk scripts.
, describes the original V7 version of awk, the current Bell Labs awk, GNU awk (gawk) from the Free Software Foundation, and mawk, by Michael Brennan. The latter three all have freely available source code. This chapter also describes two commercial implementations, MKS awk and Thomson Automation awk ( tawk ), as well as VSAwk, which brings awk-like capabilities to the Visual Basic environment.
, presents two longer, more complex awk scripts that together demonstrate nearly all the features of the language. The first script is an interactive spelling checker. The second script processes and formats the index for a book or a master index for a set of books.
, presents a number of user-contributed scripts that show different styles and techniques of writing scripts for sed and awk.
is a quick reference describing sed's commands and command-line options.
is a quick reference to awk's command-line options and a full description of its scripting language.
.
Availability of sed and awk
Sed and awk were part of Version 7 UNIX (also known as "V7," and "Seventh Edition") and have been part of the standard distribution ever since. Sed has been unchanged since it was introduced.
] to the host ftp.gnu.ai.mit.edu . It is in the file ftp://ftp.gnu.ai.mit.edu/pub/gnu/sed-2.05.tar.gz . This is a tar file compressed with the gzip program, whose source code is available in the same directory. There are many sites world-wide that "mirror" the files from the main GNU distribution site; if you know of one close to you, you should get the files from there. Be sure to use "binary" or "image" mode to transfer the file(s).
In 1985, the authors of awk extended the language, adding many useful features. Unfortunately, this new version remained inside AT&T for several years. It became part of UNIX System V as of Release 3.1. It can be found under the name of nawk, for new awk; the older version still exists under its original name. This is still the case on System V Release 4 systems.
On commercial UNIX systems, such as those from Hewlett-Packard, Sun, IBM, Digital, and others, the naming situation is more complicated. All of these systems have some version of both old and new awk, but what each vendor names each program varies. Some have oawk and awk , others have awk and nawk . The best advice we can give is to check your local documentation.[] Throughout this book, we use the term awk to describe POSIX awk. Specific implementations will be referred to by name, such as "gawk," or "the Bell Labs awk."
discusses three freely available awks (including where to get them), as well as several commercial ones.
Note
Since the first edition of this book, the awk language was standardized as part of the POSIX Command Language and Utilities Standard (P1003.2). All modern awk implementations aim to be upwardly compatible with the POSIX standard.
The standard incorporates features that originated in both new awk and gawk. In this book, you can assume that what is true for one implementation of POSIX awk is true for another, unless a particular version is designated.
DOS Versions
Gawk, mawk, and GNU sed have been ported to DOS. There are files on the main GNU distribution site with pointers to DOS versions of these programs. In addition, gawk has been ported to OS/2, VMS, and Atari and Amiga microcomputers, with ports to other systems (Macintosh, Windows) in progress.