Chapter 1. SQL History and Implementations
A note for Early Release readers
With Early Release ebooks, you get books in their earliest formthe authors raw and unedited content as they writeso you can take advantage of these technologies long before the official release of these titles.
This will be the 1st chapter of the final book.
If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at rfernando@oreilly.com.
In the early 1970s, the seminal work of IBM research fellow Dr. E. F. Codd led to the development of a relational data model product called SEQUEL, or Structured English Query Language. SEQUEL ultimately became SQL, or Structured Query Language.
IBM, along with other relational database vendors, wanted a standardized method for accessing and manipulating data in a relational database. Although IBM was the first to develop relational database theory, Oracle was first to market the technology. Over time, SQL proved popular enough in the marketplace to attract the attention of the American National Standards Institute (ANSI) in cooperation with the International Standards Organization (ISO), which released standards for SQL in 1986, 1989, 1992, 1999, 2003, 2006, 2011, and 2016.
Since 1986, various competing languages have allowed developers to access and manipulate relational data. However, few were as easy to learn or as universally accepted as SQL. Programmers and administrators now have the benefit of being able to learn a single language that, with minor adjustments, is applicable to a wide variety of database platforms, applications, and products.
SQL in a Nutshell, Fourth Edition, provides the syntax for five common implementations of SQL:
- The ANSI/ISO SQL standard
- MySQL version 8 and MariaDB 10.5
- Oracle Database 19c
- PostgreSQL version 13
- Microsofts SQL Server 2019
The Relational Model and ANSI SQL
Relational database management systems (RDBMSs) such as those covered in this book are the primary engines of information systems worldwide, and particularly of web applications and distributed client/server computing systems. They enable a multitude of users to quickly and simultaneously access, create, edit, and manipulate data without impacting other users. They also allow developers to write useful applications to access their resources and provide administrators with the capabilities they need to maintain, secure, and optimize organizational data resources.
An RDBMS is defined as a system whose users view data as a collection of tables related to each other through common data values. Data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked (or related) to one another if they each have unique, identifying columns of data (called keys) that represent data values held in common. E. F. Codd first described relational database theory in his landmark paper A Relational Model of Data for Large Shared Data Banks, published in the Communications of the ACM (Association for Computing Machinery) in June, 1970. Under Codds new relational data model, data was structured (into tables of rows and columns); manageable using operations such as selections, projections, and joins; and consistent as the result of integrity rules such as keys and referential integrity. Codd also articulated rules that governed how a relational database should be designed. The process for applying these rules is now known as normalization.
Codds Rules for Relational Database Systems
Codd applied rigorous mathematical theories (primarily set theory) to the management of data, and he compiled a list of criteria a database must meet to be considered relational. At its core, the relational database concept centers around storing data in tables. This concept is now so common as to seem trivial; however, not long ago the goal of designing a system capable of sustaining the relational model was considered a long shot with limited usefulness.
Following are Codds Twelve Principles of Relational Databases:
- Information is represented logically in tables.
- Data must be logically accessible by table, primary key, and column.
- Null values must be uniformly treated as missing information, not as empty strings, blanks, or zeros.
- Metadata (data about the database) must be stored in the database just as regular data is.
- A single language must be able to define data, views, integrity constraints, authorization, transactions, and data manipulation.
- Views must show the updates of their base tables and vice versa.
- A single operation must be available to do each of the following operations: retrieve data, insert data, update data, or delete data.
- Batch and end-user operations are logically separate from physical storage and access methods.
- Batch and end-user operations can change the database schema without having to recreate it or the applications built upon it.
- Integrity constraints must be available and stored in the metadata, not in an application program.
- The data manipulation language of the relational system should not care where or how the physical data is distributed and should not require alteration if the physical data is centralized or distributed.