Deep Learning in the World Today
Hello and welcome! This book will introduce you to deep learning via PyTorch, an open source library released by Facebook in 2017. Unless youve had your head stuck in the ground in a very good impression of an ostrich the past few years, you cant have helped but notice that neural networks are everywhere these days. Theyve gone from being the really cool bit of computer science that people learn about and then do nothing with to being carried around with us in our phones every day to improve our pictures or listen to our voice commands. Our email software reads our email and produces context-sensitive replies, our speakers listen out for us, cars drive by themselves, and the computer has finally bested humans at Go. Were also seeing the technology being used for more nefarious ends in authoritarian countries, where neural networkbacked sentinels can pick faces out of crowds and make a decision on whether they should be apprehended.
And yet, despite the feeling that this has all happened so fast, the concepts of convolutional neural networks were being used to recognize digits on check in the late 90s. Theres been a solid foundation building up all this time, so why does it feel like an explosion occurred in the last 10 years?
There are many reasons, but prime among them has to be the surge in tensor processing units (TPUs), which are devices custom-built for performing deep learning as fast as possible, and are even available to the general public as part of their Google Cloud ecosystem.
Another way to chart deep learnings progress over the past decade is through the ImageNet competition. A massive database of over 14 million pictures, manually labeled into 20,000 categories, ImageNet is a treasure trove of labeled data for machine learning purposes. Since 2010, the yearly ImageNet Large Scale Visual Recognition Challenge has sought to test all comers against a 1,000-category subset of the database, and until 2012, error rates for tackling the challenge rested around 25%. That year, however, a deep convolutional neural network won the competition with an error of 16%, massively outperforming all other entrants. In the years that followed, that error rate got pushed down further and further, to the point that in 2015, the ResNet architecture obtained a result of 3.6%, which beat the average human performance on ImageNet (5%). We had been outclassed.
But What Is Deep Learning Exactly, and Do I Need a PhD to Understand It?
Deep learnings definition often is more confusing than enlightening. A way of defining it is to say that deep learning is a machine learning technique that uses multiple and numerous layers of nonlinear transforms to progressively extract features from raw input. Which is true, but it doesnt really help, does it? I prefer to describe it as a technique to solve problems by providing the inputs and desired outputs and letting the computer find the solution, normally using a neural network.
One thing about deep learning that scares off a lot of people is the mathematics. Look at just about any paper in the field and youll be subjected to almost impenetrable amounts of notation with Greek letters all over the place, and youll likely run screaming for the hills. Heres the thing: for the most part, you dont need to be a math genius to use deep learning techniques. In fact, for most day-to-day basic uses of the technology, you dont need to know much at all, and to really understand whats going on (as youll see in , youll be able to put together an image classifier that rivals what the best minds in 2015 could offer with just a few lines of code.
PyTorch
The library also comes with modules that help with manipulating text, images, and audio (torchtext
, torchvision
, and torchaudio
), along with built-in variants of popular architectures such as ResNet (with weights that can be downloaded to provide assistance with techniques like transfer learning, which youll see in ).
Aside from Facebook, PyTorch has seen quick acceptance by industry, with companies such as Twitter, Salesforce, Uber, and NVIDIA using it in various ways for their deep learning work. Ah, but I sense a question coming.
What About TensorFlow?
Yes, lets address the rather large, Google-branded elephant in the corner. What does PyTorch offer that TensorFlow doesnt? Why should you learn PyTorch instead?
The answer is that traditional TensorFlow works in a different way than PyTorch that has major implications for code and debugging. In TensorFlow, you use the library to build up a graph representation of the neural network architecture and then you execute operations on that graph, which happens within the TensorFlow library. This method of declarative programming is somewhat at odds with Pythons more imperative paradigm, meaning that Python TensorFlow programs can look and feel somewhat odd and difficult to understand. The other issue is that the static graph declaration can make dynamically altering the architecture during training and inference time a lot more complicated and stuffed with boilerplate than with PyTorchs approach.
For these reasons, PyTorch has become popular in research-oriented communities. The number of papers submitted to the International Conference on Learning Representations that mention PyTorch has jumped 200% in the past year, and the number of papers mentioning TensorFlow has increased almost equally. PyTorch is definitely here to stay.
However, things are changing in more recent versions of TensorFlow. A new feature called