This chapter gives a broad overview and a historical context around the subject of deep learning. It also gives the reader a roadmap for navigating the book, the prerequisites, and further reading to dive deeper into the subject matter.
Historical Context
The field of Artificial Intelligence (AI) , which can definitely be considered to be the parent field of deep learning, has a rich history going back to 1950. While we will not cover this history in much detail, we will go over some of the key turning points in the field, which will lead us to deep learning.
Tasks that AI focused on in its early days were tasks that could be easily described formally, like the game of checkers or chess. This notion of being able to easily describe the task formally is at the heart of what can or cannot be done easily by a computer program. For instance, consider the game of chess. The formal description of the game of chess would be the representation of the board, a description of how each of the pieces move, the starting configuration, and a description of the configuration wherein the game terminates.
With these notions formalized, it's relatively easy to model a chess-playing AI program as a search and, given sufficient computational resources, its possible to produces a relatively good chess-playing AI.
The first era of AI focused on such tasks with a fair amount of success. At the heart of the methodology was a symbolic representation of the domain and the manipulation of symbols based on given rules (with increasingly sophisticated algorithms for searching the solution space to arrive at a solution).
It must be noted that the formal definitions of such rules were done manually. However, such early AI systems were fairly general purpose task/problem solvers in the sense that any problem that could be described formally could be solved with the generic approach.
The key limitation about such systems is that the game of chess is a relatively easy problem for AI simply because the problem setting is relatively simple and can be easily formalized. This is not the case with many of the problems human beings solve on a day-to-day basis (natural intelligence). For instance, consider diagnosing a disease (as a physician does) or transcribing human speech to text. These tasks, like most other tasks human beings master easily, are hard to describe formally and presented a challenge in the early days of AI.
Human beings address such tasks by leveraging a large amount of knowledge about the task/problem domain. Given this observation, subsequent AI systems relied on a large knowledge base which captured the knowledge about the problem/task domain . One point to be noted is the term used here is knowledge, not information or data. By knowledge we simply mean data/information that a program/algorithm can reason about. An example of this could be a graph representation of a map with edges labeled with distances and about traffic (which is being constantly updated), which allows a program to reason about the shortest path between points.
Such knowledge-based systems wherein the knowledge was compiled by experts and represented in a way which allowed algorithms/programs to reason about it represent the second generation of AI. At the heart of such approaches were increasingly sophisticated approaches for representing and reasoning about knowledge to solve tasks/problems which required such knowledge. Examples of such sophistication include the use of first order logic to encode knowledge and probabilistic representations to capture and reason where uncertainty is inherent to the domain.
One of the key challenges that such systems faced and addressed to some extent was the uncertainty inherent in many domains. Human beings are relatively good at reasoning in environments with unknowns and uncertainty. One key observation here is that even the knowledge we hold about a domain is not black or white but gray. A lot of progress was made in this era on representing and reasoning about unknowns and uncertainty. There were some limited successes in tasks like diagnosing a disease, which relied on leveraging and reasoning using a knowledge base in the presence of unknowns and uncertainty.
The key limitation of such systems was the need to hand compile the knowledge about the domain from experts. Collecting, compiling, and maintaining such knowledge bases rendered such systems unpractical. In certain domains, it was extremely hard to even collect and compile such knowledge (for instance, transcribing speech to text or translating documents from one language to another). While human beings can easily learn to do such tasks, it's extremely challenging to hand compile and encode the knowledge related to the tasks (for instance, the knowledge of the English language and grammar, accents, and subject matter).
Human beings address such tasks by acquiring knowledge about a task/problem domain, a process which is referred to as learning. Given this observation, the focus of subsequent work in AI shifted over a decade or two to algorithms that improved their performance based on data provided to them. The focus of this subfield was to develop algorithms that acquired relevant knowledge for a task/problem domain given data. It is important to note that this knowledge acquisition relied on labeled data and a suitable representation of labeled data as defined by a human being.
For instance, consider the problem of diagnosing a disease. For such a task, a human expert would collect a lot of cases where a patient had and did not have the disease in question. Then, the human expert would identify a number of features that would aid making the prediction like, say, the age of the patient, the gender, and results from a number of diagnostic tests like blood pressure, blood sugar, etc. The human expert would compile all this data and represent it in a suitable way like scaling/normalizing the data, etc. Once this data was prepared, a machine learning algorithm can learn how to infer whether the patient has the disease or not by generalizing from the labeled data. Note that the labeled data consisted of patients that both have and do not have the disease. So, in essence, the underlying ML algorithm is essentially doing the job of finding a mathematical function that can produce the right outcome (disease or no disease) given the inputs (features like age, gender, data from diagnostic tests, etc.). Finding the simplest mathematical function that predicts the outputs with required level of accuracy is at the heart of the field of ML. Specific questions like how many examples are required to learn a task or the time complexity of the algorithm, etc., are specific questions on which the field of ML has provided answers with theoretical justification. The field has matured to a point where, given enough data, computer resources, and human resources to engineer features, a large class of problems are solvable.
The key limitation of mainstream ML algorithms is that applying them to a new problem domain requires a massive amount of feature engineering. For instance, consider the problem of recognizing objects in images. Using traditional ML techniques, such a problem will require a massive feature engineering effort wherein experts would identify and generate features which would be used by the ML algorithm. In a sense, the true intelligence is in the identification of features and what the ML algorithm is doing is simply learning how to combine these features to arrive at the correct answer. This identification of features or the representation of data which domain experts do before ML algorithms are applied is both a conceptual and practical bottleneck in AI.