Intelligent video surveillance is one of the most essential application domains of computer vision and machine learning technologies, and human re-identification is one of the most challenging problems in intelligent video surveillance. In the first part of this book, we are going to introduce the background and the current state of human re-identification.
Springer International Publishing Switzerland 2016
Ziyan Wu Human Re-Identification Multimedia Systems and Applications 10.1007/978-3-319-40991-7_1
1. The Problem of Human Re-Identification
One of the ultimate goal of computer science is to endow computers in addition to exceptional speed of calculation with advanced intelligence, rich emotions, and accurate perceptions, just like those of humans. One major step toward that is to enable computers to sense like a human being. In particular, computer vision is the set of technologies that can make computers to see.
1.1 Surveillance and Computer Vision
Applications based on computer vision technologies are everywhere in our daily life. Social network platforms are tagging and classifying human faces on pictures user uploaded. Cellphone camera applications can easily guide you on obtaining a 360-degree panorama image. One can even unlock a computer without keying in pass codeby either using fingerprint scanner or even scanning his/her face in front of the web camera. 3D maps on your computer are reconstructed with a bunch of cameras and vision algorithms.
We are enjoying fun and conveniences brought by these applications. In fact, computer vision techniques are playing far more crucial roles in our lives. Factories are inspecting products and instruments with accurate and nondestructive vision inspection approaches; The future of self-driving cars are relying on vision perception methods; Clinical physician can conduct more efficient and effective operations with the help of automatic image-based device tracking and organ segmentation algorithms.
One other domain in which vision technologies are being developed is security surveillance such as target recognition, threat monitoring as well as criminal activity investigation. As a matter of fact, even nowadays most surveillance systems in mass transit environments including bus and train stations, and airports are still monitored by human operators. These environments are facing great challenges in protecting passengers from security threats and breaches. Even though increasingly advanced video surveillance systems have been deployed in tens of thousands of airports around the world with analog video cameras being replaced by digital cameras gradually, and video tapes being replaced by advanced compact digital recording systems, the final link of the system responsible for triggering the alarms are still human officers. Just imagine these staff members watching hundreds of video monitors for at least eight hours continuously everyday. Previous research [] has shown that the attention of most human individuals is likely to stray far below acceptable levels over such long time spans.
US Department of Homeland Security is spending billions of dollars for government agencies and public facilities to install advanced video surveillance systems, including the development of intelligent video analytic technologies, which can relieve human operators in surveillance tasks. With the rapid development of computer vision and artificial intelligence, many of the surveillance task can already be accomplished by computer programs, such as pedestrian detection, change detection, and flow tracking. However these tasks are all on relatively lower level, meaning decisions on whether or not triggering the alarms cannot be directly obtained with the output from these components. Although researchers have been looking at solving the higher level surveillance problems, e.g., multiple object tracking, target identities association, most of these algorithms are not ready to be deployed in real-world scenarios due to insufficient reliability, unable to adapt to change in environments and lack of computation scalability. One typical example among them is human re-identification problem [], which is often considered as the most challenging surveillance problem for not just machines, but sometimes also for human.
1.2 Human Re-Identification
Recognizing the same human as he or she moves through a network of cameras with nonoverlapping fields of view is an important and challenging problem in security and surveillance applications. This is often called the re-identification or re-id problem. For example, in an airport security surveillance system, once a target has been identified in one camera by a user or program, we want to learn the appearance of the target and recognize him/her when he/she is observed by the other cameras. We call this type of re-id problem tag-and-track. Human re-identification across cameras with nonoverlapping fields of view is one of the most important and difficult problems in video surveillance and analysis. The general process of human re-identification is shown in Fig..
Fig. 1.1
Process of human re-identification
Once a target is tagged in a camera view, sub-image (shown as a red-bounding box in Fig. ) will be extracted from the raw video frame to eliminate the background and clutters. Visual features will be extracted from the sub-image to form the signature of the target, which will be used to match with the signatures from all candidates from other camera views after the target left the view where he/she was tagged initially. Finally the candidates with the highest matching score will be considered as the top candidate of the target. In reality, additional reasoning will be taken into account in the decision stage, including spatial and temporal constraints and priors.
1.3 Looking at People
Traditional biometric methods such as face [], where more sophisticated and complex features can be extracted. Unfortunately in real-world scenarios, high-definition videos are usually not available.
That is the reason why we have to look at people differently, which is also the reason why human re-identification problem is so challenging. Researchers were trying to reconsider this problem by observing the human vision system. When a target is far away, one tends to identify it by extracting high-level attributes, from style and color of hair, to color of cloths and belongings. Given a moving target, people would also pay attention to the way he/she moves. These observations became the essence of the way for computer algorithms to look at human objects in human re-identification applications.
1.3.1 Re-Identification and Image Retrieval
Image retrieval has been studied for a long time, and it has a very similar scope compared to human re-identification. In general both re-identification and image retrieval are trying to recognize the same set of images belonging to the same category or object as the query image. Some of the approaches can potentially be shared by both of the two domains (e.g., []). However for image retrieval, usually in the training stage, all possible query categories should be included in the training dataset, while for re-identification, the probe targets are usually unseen in training stage. Hence most of the re-identification approaches are trying to learn a metric to identify whether any given image pairs belong to the same object.