CHAPTER 1
Deep Learning Frameworks
Deep learning is arguably the most popular aspect of AI, especially when it comes to data science (DS) applications. But what exactly are deep learning frameworks, and how are they related to other terms often used in AI and data science?
In this context, framework refers to a set of tools and processes for developing a certain system, testing it, and ultimately deploying it. Most AI systems today are created using frameworks. When a developer downloads and installs a framework on his computer, it is usually accompanied by a library. This library (or package, as it is often termed in high-level languages) will be compiled in the programming languages supported by the AI framework. The library acts like a proxy to the framework, making its various processes available through a series of functions and classes in the programming language used. This way, you can do everything the framework enables you to do, without leaving the programming environment where you have the rest of your scripts and data. So, for all practical purposes, that library is the framework, even if the framework can manifest in other programming languages too. This way, a framework supported by both Python and Julia can be accessed through either one of these languages, making the language you use a matter of preference. Since enabling a framework to function in a different language is a challenging task for the creators of the framework, oftentimes the options they provide for the languages compatible with that framework are rather limited.
But what is a system, exactly? In a nutshell, a system is a standalone program or script designed to accomplish a certain task or set of tasks. In a data science setting, a system often corresponds to a data model. However, systems can include features beyond just models, such as an I/O process or a data transformation process.
The term model involves a mathematical abstraction used to represent a real-world situation in a simpler, more workable manner. Models in DS are optimized through a process called training , and validated through a process called testing , before they are deployed.
Another term that often appears alongside these terms is methodology , which refers to a set of methods and the theory behind those methods, for solving a particular type of problem in a certain field. Different methodologies are often geared towards different applications/objectives.
Its easy to see why frameworks are celebrities of sorts in the AI world. They help make the modeling aspect of the pipeline faster, and they make the data engineering demanded by deep learning models significantly easier. This makes AI frameworks great for companies that cannot afford a whole team of data scientists, or prefer to empower and develop the data scientists they already have.
These systems are fairly simple, but not quite plug and play. In this chapter well explore the utility behind deep learning models, their key characteristics, how they are used, their main applications, and the methodologies they support.
About deep learning systems
Deep Learning (DL) is a subset of AI that is used for predictive analytics , using an AI system called an Artificial Neural Network (ANN) . Predictive analytics is a group of data science methodologies that are related to the prediction of certain variables. This includes various techniques such as classification, regression, etc. As for an ANN, it is a clever abstraction of the human brain, at a much smaller scale. ANNs manage to approximate every function (mapping) that has been tried on them, making them ideal for any data analytics related task. In data science, ANNs are categorized as machine learning methodologies.
The main drawback DL systems have is that they are black boxes. It is exceedingly difficult practically unfeasible to figure out exactly how their predictions happen, as the data flux in them is extremely complicated.
Deep Learning generally involves large ANNs that are often specialized for specific tasks. Convolutional Neural Networks (CNNs) ANNs, for instance, are better for processing images, video, and audio data streams. However, all DL systems share a similar structure. This involves elementary modules called neurons organized in layers, with various connections among them. These modules can perform some basic transformations (usually non-linear ones) as data passes through them. Since there is a plethora of potential connections among these neurons, organizing them in a structured way (much like real neurons are organized in network in brain tissue), we can obtain a more robust and function form of these modules. This is what an artificial neural network is, in a nutshell.
In general, DL frameworks include tools for building a DL system, methods for testing it, and various other Extract, Transform, and Load (ETL) processes; when taken together, these framework components help you seamlessly integrate DL systems with the rest of your pipeline. Well look at this in more detail later in this chapter.
Although deep learning systems share some similarities with machine learning systems, certain characteristics make them sufficiently distinct. For example, conventional machine learning systems tend to be simpler and have fewer options for training. DL systems are noticeably more sophisticated; they each have a set of training algorithms, along with several parameters regarding the systems architecture . This is one of the reasons we consider them a distinct framework in data science.
DL systems also tend to be more autonomous than their machine counterparts. To some extent, DL systems can do their own feature engineering . More conventional systems tend to require more fine-tuning of the feature-set, and sometimes require dimensionality reduction to provide any decent results.
In addition, the generalization of conventional ML systems when provided with additional data generally dont improve as much as DL systems. This is also one of the key characteristics that makes DL systems a preferable option when big data is involved.
Finally, DL systems take longer to train and require more computational resources than conventional ML systems. This is due to their more sophisticated functionality. However, as the work of DL systems is easily parallelizable , modern computing architecture as well as cloud computing, benefit DL systems the most, compared to other predictive analytics systems.
How deep learning systems work
At their cores, all DL frameworks work similarly, particularly when it comes to the development of DL networks. First, a DL network consists of several neurons organized in layers; many of these are connected to other neurons in other layers. In the simplest DL network, connections take place only between neurons in adjacent layers.
The first layer of the network corresponds to the features of our dataset; the last layer corresponds to its outputs. In the case of classification , each class has its own node, with node values reflecting how confident the system is that a data point belongs to that class. The layers in the middle involve some combination of these features. Since they arent visible to the end user of the network, they are described as hidden (see Figure 1).
The connections among the nodes are weighted, indicating the contribution of each node to the nodes of the next layer it is connected to, in the next layer. The weights are initially randomized, when the network object is created, but are refined as the ANN is trained.
Moreover, each node contains a mathematical function that creates a transformation of the received signal, before it is passed to the next layer. This is referred to as the transfer function (also known as the activation function ). The sigmoid function is the most well-known of these, but others include softmax, tanh, and ReLU. Well delve more into these in a moment.
Next page