Thomas B. Moeslund Undergraduate Topics in Computer Science Introduction to Video and Image Processing 2012 Building Real Systems and Applications 10.1007/978-1-4471-2503-7_1 Springer-Verlag London Limited 2012
1. Introduction
If you look at the image in Fig. you can see three children. The two oldest children look content with life, while the youngest child looks a bit puzzled. We can detail this description further using adjectives, but we will never ever be able to present a textual description, which encapsulates all the details in the image. This fact is normally referred to as a picture is worth a thousand words .
Fig. 1.1
An image containing three children
So, our eyes and our brain are capable of extracting detailed information far beyond what can be described in text, and it is this ability we want to replicate in the seeing computer. To this end a camera replaces the eyes and the (video and image) processing software replaces the human brain. The purpose of this book is to present the basics within these two topics; cameras and video/image processing.
Cameras have been around for many years and were initially developed with the purpose of freezing a part of the world, for example to be used in newspapers. For a long time cameras were analog, meaning that the video and images were captured on film. As digital technology matured, the possibility of digital video and images arose, and video and image processing became relevant and necessary sciences.
Some of the first applications of digital video and image processing were to improve the quality of the captured images, but as the power of computers grew, so did the number of applications where video and image processing could make a difference. Today, video and image processing are used in many diverse applications, such as astronomy (to enhance the quality), medicine (to measure and understand some parameters of the human body, e.g., blood flow in fractured veins), image compression (to reduce the memory requirement when storing an image), sports (to capture the motion of an athlete in order to understand and improve the performance), rehabilitation (to assess the locomotion abilities), motion pictures (to capture actors motion in order to produce special effects based on graphics), surveillance (detect and track individuals and vehicles), production industries (to assess the quality of products), robot control (to detect objects and their pose so a robot can pick them up), TV productions (mixing graphics and live video, e.g., weather forecast), biometrics (to measure some unique parameters of a person), photo editing (improving the quality or adding effects to photographs), etc.
Many of these applications rely on the same video and image processing methods, and it is these basic methods which are the focus of this book.
1.1 The Different Flavors of Video and Image Processing
The different video and image processing methods are often grouped into the categories listed below. There is no unique definition of the different categories and to make matters worse they also overlap significantly. Here is one set of definitions:
Video and Image Compression
This is probably the most well defined category and contains the group of methods used for compressing video and image data.
Image Manipulation
This category covers methods used to edit an image. For example, when rotating or scaling an image, but also when improving the quality by for example changing the contrast.
Image Processing
Image processing originates from the more general field of signal processing and covers methods used to segment the object of interest. Segmentation here refers to methods which in some way enhance the object while suppressing the rest of the image (for example the edges in an image).
Video Processing
Video processing covers most of the image processing methods, but also includes methods where the temporal nature of video data is exploited.
Image Analysis
Here the goal is to analyze the image with the purpose of first finding objects of interest and then extracting some parameters of these objects. For example, finding an objects position and size.
Machine Vision
When applying video processing, image processing or image analysis in production industries it is normally referred to as machine vision or simply vision .
Computer Vision
Humans have human vision and similarly a computer has computer vision . When talking about computer vision we normally mean advanced algorithms similar to those a human can perform, e.g., face recognition. Normally computer vision also covers all methods where more than one camera is applied.
Even though this book is titled: Video and Image Processing it also covers basic methods from Image Manipulation and Image Analysis in order to provide the reader with a solid foundation for understanding and working with images and video.
1.2 General Framework
No matter which category you are working within (except for Video and Image Compression) you can very often apply the framework illustrated in Fig.. Sometimes not all blocks are included in a particular system, but the framework nevertheless provides a relevant guideline.
Fig. 1.2
The block diagram provides a general framework for many systems working with video and images
Underneath each block in the figure we have illustrated a typical output. The particular outputs are from a gesture-based humancomputer-interface system that counts the number of fingers a user is showing in front of the camera.
Below we briefly describe the purpose of the different blocks:
Image Acquisition
In this block everything to do with the camera and setup of your system is covered, e.g., camera type, camera settings, optics, and light sources.
Pre-processing
This block does something to your image before the actual processing commences, e.g., convert the image from color to gray-scale or crop the most interesting part of the image (as seen in Fig. ).
Segmentation
This is where the information of interest is extracted from the image or video data. Often this block is the heart of a system. In the example in the figure the information is the fingers. The image below the segmentation block shows that the fingers (together with some noise) have been segmented (indicated by white objects).