Preface
There used to be a time when what is known today as "Information Technology" or IT was less glamorously known as "Electronic Data Processing." And the truth is that for all the buzz about trendy techniques, the processing of data is still at the core of our systemsand all the more as the volume of data under management seems to be increasing even faster than the speed of processors. The most vital corporate data is today stored in databases and accessed through the imperfect, but widely known, SQL languagea combination that had begun to gain acceptance in the pinstriped circles at the beginning of the 1980s and has since wiped out the competition.
You can hardly interview a young developer today who doesn't claim a good working knowledge of SQL, the lingua franca of database access, a standard part of any basic IT course. This claim is usually reasonably true, if you define knowledge as the ability to obtain, after some effort, functionally correct results. However, enterprises all over the world are today confronted with exploding volumes of data. As a result, "functionally correct" results are no longer enough: they also have to be fast. Database performance has become a major headache in many companies. Interestingly, although everyone agrees that the source of performance issues lies in the code, it seems accepted everywhere that the first concern of developers should be to provide code that workswhich seems to be a reasonable expectation. The thought seems to be that the database access part of their code should be as simple as possible, for maintenance reasons, and that "bad SQL" should be given to senior database administrators (DBAs) to tweak and make run faster, with the help of a few magic database parameters. And if such tweaking isn't enough, then it seems that upgrading the hardware is the proper course to take.
It is quite often that what appears to be the common-sense and safe approach ends up being extremely harmful. Writing inefficient code and relying on experts for tuning the "bad SQL" is actually sweeping the dirt under the carpet. In my view, the first ones to be concerned with performance should be developers, and I see SQL issues as something encompassing much more than the proper writing of a few queries. Performance seen from a developer's perspective is something profoundly different from "tuning," as practiced by DBAs. A database administrator tries to get the most out of a systema given hardware, processors and storage subsystem, or a given version of the database. A database administrator may have some SQL skills and be able to tune an especially poorly performing statement. But developers are writing code that may well run for 5 to 10 years, surviving several major releases (Internet-enabled, ready-for-the-grid, you name it) of the Database Management System (DBMS) it was written forand on several generations of hardware. Your code must be fast and sound from the start. It is a sorry assessment to make but if many developers "know" SQL, very few have a sound understanding of this language and of the relational theory.
Why Another SQL Book?
There are three main types of SQL books: books that teach the logic and the syntax of a particular SQL dialect, books that teach advanced techniques and take a problem-solving approach, and performance and tuning books that target experts and senior DBAs. On one hand, books show how to write SQL code. On the other hand, they show how to diagnose and fix SQL code that has been badly written. I have tried, in this book, to teach people who are no longer novices how to write good SQL code from the start and, most importantly, to have a view of SQL code that goes beyond individual SQL statements.
Teaching how to use a language is difficult enough; but how can one teach how to efficiently use a language? SQL is a language that can look deceivingly simple once you have been initiated. And yet it allows for an almost infinite number of cases and combinations. The first comparison that occurred to me was the game of chess, but it suddenly dawned on me that chess was invented to teach war. I have a natural tendency to consider every new performance challenge as a battle to be fought against an army of rows, and I realized that the problem of teaching developers how to use databases efficiently was similar to the problem of teaching officers how to conduct a war. You need knowledge, you need skills, and you need talent. Talent cannot be taught, but it can be nurtured. This is what most strategists, from Sun Tzu, who wrote his Art of War 25 centuries ago, to modern-day generals, have believedso they tried to pass on the experience acquired on the field through simple maxims and rules that they hoped would serve as guiding stars among the sound and fury of battles. I have tried to apply this method to more peaceful aims, and I have mostly followed the same plan as Sun Tzuand I've borrowed his title. Many respected IT specialists claim the status of scientists; "Art" seems to me more appropriate than "Science" when it comes to defining an activity that requires flair, experience, and creativity, as much as rigor and understanding.[] It is quite likely that my fondness for Art will be frowned upon by some partisans of Science, who claim that for each SQL problem, there is one optimal solution, which can be attained by rigorous analysis and a good knowledge of data. However, I don't see the two positions at odds. Rigor and a scientific approach will help you out of one problem at one given moment . In SQL development, if you don't have the uncertainties linked to the next move of the adversary, the big uncertainties lie in future evolutions. What if, rather unexpectedly, the volume of this or that table increases? What if, following a merger, the number of users doubles? What if we want to keep several years of data online? How will a program behave on hardware totally different from what we have now? Some architectural choices are gambles on the future. You will certainly need rigor and a very sound theoretical knowledgebut those qualities are prerequisites of any art. Ferdinand Foch, the future Supreme Commander of the Allied armies of WWI, remarked at a lecture at the French Ecole Suprieure de Guerre in 1900 that:
The art of war, like all other arts, has its theory, its principlesotherwise, it wouldn't be an art.
This book is not a cookbook, listing problems and giving "recipes." The aim is much more to help developersand their managersto raise good questions. You may well still write awful, costly queries after having read and digested this book. One sometimes has to. But, hopefully, it will be knowingly and with good reason.