The Portable Document Format (PDF) is the worlds leading page description language, and the first format equally useful for print and online use.
PDF documents are now almost ubiquitous in the printing industry, in document interchange, and in the online distribution of paginated content. They are, however, widely viewed as opaque and delicate and are poorly understood, even by those of a technical disposition.
This is partly due to a perplexing lack of documentation; the file format reference is freely available, but is of a size and complexity which requires a time investment unlikely to be plausible for the majority of those working with PDF.
This book aims to be an approachable introduction. It is suitable both for the technically-minded, and for those who just want to understand a little of the PDF format to give context to their work with tools which produce or process PDF documents.
Organization of Contents
In this chapter, we give a history of the PDF format and put it into context. We look at the advantages PDF has over similar technologies, introduce specialized kinds of PDF files such as PDF/X and PDF/A, and take a brief tour of the elements which comprise a typical PDF document. We conclude by looking at how PDF is used in industry.
We begin in earnest, building a simple PDF file from scratch in a text editor. We show how to process this into a fully valid PDF and open it in a PDF viewer. We explain each component of the file, taking our first look at various parts of the PDF syntax.
In this chapter, we describe the layout and content of a PDF file, and the syntax of the objects from which it is built. We describe how a PDF document is read from a flat file into a structured format and, conversely, written from that structured format to a flat file.
In this chapter, we leave behind the bits and bytes of the PDF file, and consider the logical structure of its objects, describing how pages and their resources are arranged into a document.
We describe how to create vector graphics and raster images in PDF, and how to deal with transparency, color spaces, and patterns. We illustrate with examples, showing the code and the result in a PDF viewer.
In this chapter, we look at the PDF operators for building and showing text strings using different fonts and sizes, and how to build lines and paragraphs. We describe the different types of fonts and encodings in PDF documents, and how they are defined and used. We look at the process of text extraction from a PDF document.
Here, we discuss topics not directly related to the visual appearance of the document, but to ancillary data: bookmarks, metadata, hyperlinks, annotations, and file attachments. For each, we describe how they are defined in PDF and give examples.
We look at how encryption and document permissions work in PDF, and see how to inspect encryption information in Adobe Reader. We describe how programs which process PDF files read, write, and edit encrypted documents.
In this chapter, we show how to use the popular pdftk program for the command-line processing of PDF files, looking at common usage scenarios. We describe what a program such as pdftk has to do internally to achieve certain tasks (for example, merging or splitting documents).
Here, we describe both Adobe and open-source software for viewing, converting, editing, and programming with PDF files. We give sources of further documentation and other resources such as support and discussion forums.