Sven J. Dickinson and Zygmunt Pizlo (eds.) Advances in Computer Vision and Pattern Recognition Shape Perception in Human and Computer Vision 2013 An Interdisciplinary Perspective 10.1007/978-1-4471-5195-1_1
Springer-Verlag London 2013
1. The Role of Mid-Level Shape Priors in Perceptual Grouping and Image Abstraction
Abstract
Perceptual grouping plays a critical role in both human and computer vision. However, with the object categorization communitys preoccupation with object detection, interest in perceptual grouping has waned. The reason for this is clear: the object-independent, mid-level shape priors that form the basis of perceptual grouping are subsumed by the object-dependent, high-level shape priors defined by a target object. As the recognition community moves from object detection back to object recognition, a linear search through a large database of target models is intractable, and perceptual grouping will be essential for sublinear scaling. We review three approaches to perceptual grouping based on grouping superpixels. In the first, we use symmetry to group superpixels into symmetric parts, and then group the parts to form structured objects. In the second, we use contour closure to group superpixels, yielding a figure-ground segmentation. In the third, we use a vocabulary of simple parts to both group superpixels into parts and recover the abstract shapes of the parts.
1.1 Introduction
Have a look at the image in Fig. ]) and dont read any further until you recognize the object(s) in the scene. For most people, the image of a horse and rider quickly emerges. This is remarkable considering that each individual black fragment is practically meaningless in terms of its indexing power to suggest a horse or rider (or any object, for that matter). Only when the fragments are grouped together and abstracted to yield meaningful parts and relations do the objects begin to emerge. Moreover, these grouping and abstraction processes are primarily bottom-up, and do not require a priori knowledge of scene content. Nobody told you what object to look for, and you certainly didnt run through tens of thousands of category detectors to decide that it was a horse and rider and not a table and chair. Somehow, your visual system grouped the fragments to form a set of abstract parts, then grouped those parts into larger configurations, then queried your visual memory for similar configurations, and only then used a priori knowledge of a promising candidate to detect, i.e., verify, the object.
Fig. 1.1
( a ) An illustration of the power of perceptual grouping. Individually, the black, amorphous blobs carry very little information. However, when grouped into parts, the emergent part structure allows the scene (horse and rider) to be quickly interpreted without any a priori knowledge of scene content (figure reproduced with kind permission from Teachers College Press, Columbia University, New York: A gestalt completion test: a study of a cross section of intellect , 1931, Roy F. Street, p. 55, Fig. 8); ( b ) The rise and fall of perceptual grouping. Tracking perceptual grouping papers in the computer vision communitys four main conferences indicates a growing interest in perceptual grouping, peaking in the late 1990s. However, since then, interest in this critically important problem has waned
Perceptual grouping is a critical function in the human visual system, offering a powerful heuristic for grouping together causally related image features in support of both figure-ground segmentation and 3-D inference. In the mid-to-late 1990s, perceptual grouping was a thriving subcommunity in computer vision, as illustrated in Fig. (b). However, over the past 10 years, theres been a steady decline in the number of perceptual grouping papers appearing in the computer vision communitys main conferences. The reason for this is the reformulation of object recognition, historically cast as the problem of recognizing an object from a large database, as a detection problem, cast as the search for a particular target object.
The classical formulation of the object recognition problem, which defined the mainstream from the mid-1960s through to the late-1990s, was the recognition of an unexpected object from a database of objects. As illustrated in Fig. ], a local edgel carries very little information with which to index into a database of objects in an attempt to select a small number of promising object models that might account for the edgels.
Fig. 1.2
In the classical recognition model, the desire to extract shape features, considered more generic than appearance, began with edge detection. Because edgels were not discriminative, they were perceptually grouped and abstracted to form distinctive indexing structures that could prune a large database of objects down to a small number of promising candidates. (figure reproduced with kind permission from Springer Science+Business Media: Proceedings, 4th Mexican Conference on Pattern Recognition (MCPR) , Perceptual Grouping using Superpixels, 2012, S. Dickinson, A. Levinshtein, and C. Sminchisescu, p. 14, Fig. 1)
The need for perceptual grouping in these early systems was critical, for only when the edgels were grouped into longer contours, perhaps parsed at high-curvature points, and grouped with other causally related contours, did distinctive indexing features emerge. Lowes thesis [], the more powerful the resulting indexing structure and the fewer candidates that ultimately needed to be verified. Each candidate was verified, yielding a score (typically reflecting the degree to which a model could be aligned with image features), and the top-scoring candidate, if sufficiently strong, gave the final interpretation.
The formulation of object recognition as the detection of a specific target object has dominated the recognition community over the past 10 years. As illustrated in Fig. ], to give the final score, thereby short-circuiting the entire perceptual grouping process.
Fig. 1.3
The classical formulation of object recognition from a large database has given way to a more recent formulation of object recognition as target detection: ( top ) rather than verifying a number of candidates, the target candidate is known, rendering the process of indexing (or model selection) obsolete. ( Middle ) Without the need for domain-independent recovery, grouping, and abstraction of structure in order to prune a large database down to a small number of promising candidates, perceptual grouping is unnecessary. ( Bottom ) As a result, verification (detection) can be applied directly to the ungrouped, low-level edge features. (Figure reproduced with kind permission from Springer Science+Business Media: Proceedings, 4th Mexican Conference on Pattern Recognition (MCPR) , Perceptual Grouping using Superpixels, 2012, S. Dickinson, A. Levinshtein, and C. Sminchisescu, p. 14, Fig. 1)