BiBLioteca cervantes En el Paraguay, tanto la circulación como la

Moreover, most methods presented in the literature are based on a single feature space to describe images, where the vast majority of them use the SIFT features [7, 13, 21, 44]. These features have impressive robustness qualities to the changes in the orientation and scale, but they suffer from limited robustness to illumination and affine changes. Fewer approaches in the literature use global signature features such as colour histograms [154, 201].

New methods have started to appear, in which they rely directly on matching features extracted from images [66, 113, 216]. In these methods, the loop-closure detection is performed by using the features themselves rather than their vector quantised representation. The main challenge for these techniques is how to cope with the search computational time. Fortunately, a data structure for storing the extracted features such as KD-tree would solve this problem. The search time in this data structure is logarithmic due to the time to descend from the root of the tree to the leaves.

7.2 Related Work

In the last decade, many appearance-based localisation and mapping solutions have been proposed. Although many solutions use the stereo vision systems [201, 219], many others make use of monocular configurations [8, 9, 43, 113]. The proposed technique belongs to this latter category. This approach is motivated by the bag-of- words methods, which were introduced to the computer vision by Sivic and Zisserman [180], and Nister and Stewenius [144]. However, we adopt the appearance-based technique using local image features themselves.

Before the introduction of the loop-closure concept to the computer vision com- munity, previous techniques have been used for the localisation task, where the map is known and the vehicle is guaranteed to be in some location within this map. To use these techniques in the loop-closure detection problem, the algorithm must be able to distinguish views coming from unvisited locations. Obviously, this makes the problem significantly more complicated. The perceptual aliasing problem is the most challenging problem of these techniques, where different places can appear similar to the vehicle vision sensing system. Many techniques have been proposed to deal with this problem, in which some success is achieved. However, a complete and satisfactory approach has not been yet developed.

For image description, the BoW approach has gained more interests [140, 180]. This method has introduced new concepts, such as the visual vocabularies. This model processes an image as a collection of visual vocabularies called "Bag of Words", similarly to a text document (Figure 7.1) [180]. This model adopts the technique

162 Chapter

7.

Real-Time Loop-Closure Detection

Feature extraction and matching with the dictionary

Original Image The Dictionary The Bag of Words representation

Image quantification with Visual Words

Fig. 7.1 Example of image representation with the "bag-of-words" concept. After visual words extraction from the original image (left), matching with words already existing in a dictionary is performed (middle). Image will be represented in function of the occurrence of these words in the dictionary (right).

used for text retrieval, which allows a fast search. However, this model ignores the image geometry, which limits its efficiency. Other proposed techniques in the literature try to cope with this problem by using the geometric information in a second step for verification [38].

Authors in [140] have proposed a solution that combines laser and vision systems. An incremental scheme with the BoW concept is used by Angeli et al. [8, 9] in estimating the loop-closure probabilities. This work was expanded with other local feature of colour histograms along with the SIFT features. Even though good performance is obtained using this technique, some drawbacks can be noticed, especially in dealing with perceptual aliasing problem. Schindler et al. proposed a more discriminative approach for building the visual words [169]. Cummins and Newman developed a new algorithm for fast appearance-based mapping (FAB-MAP), where a Chow-Liu tree is used for modelling the dependencies between the visual words [43].

Other techniques use the similarity matrices as an extension to the appearance- based methods [109, 176]. A similarity score is defined between selected images, then a square similarity matrix is computed, gathering the pair similarity between all images. Loop closures in this approach will appear in the off-diagonal entries of this matrix. As a development to this technique, Newman et al. [85, 86, 141] introduced an approach to deal with the perceptual aliasing problem by using the SVD of the similarity matrix, where the aim is to eliminate the effect of repetitive structures.

Other techniques use global descriptors, such as Gist descriptors [148]. Siagian and Itti presented in [175] a low-computational complexity and biologically-inspired scene classifier using Gist representation. In [112], Y. Liu and H. Zhang presented a method for visual loop-closure detection using Gabor-Gist descriptor by applying

7.2. Related Work 163

the principal component analysis (PCA) technique to them. Similarly, Singh and Kosecka presented in [179] an approach for detecting loop-closures in a large sequence of omni-directional images from urban environments.

Clearly, some methods require a prior offline learning phase, in which a substantial amount of representative images of the environment need to be analysed and encoded to obtain a relevant model of the environment [44, 140]. The main drawback of these methods is the limitation of the robot navigation in areas not properly learned. Other approaches try to create a model for the environment where the robot is planned to navigate [31, 211]. However, these techniques prevent a direct adaptation of the navigation algorithms to different environments.

Another category of loop-closure detection method, apart from bag-of-words (BoW) or global descriptors, focuses on using local invariant features themselves. Authors in [100] proposed an approach which uses position-invariant robust features (PIRFs) to describe images. Zhang presented a method which uses a selective subset of visual features relying on the SIFT features. These feature in turn will be matched consecutively in several images [216]. The main drawback of this technique is the growing complexity with the number of images. To cope with this issue, Liu and Zhang presented a KD-tree-structure-based approach in appearance-based robot SLAM for a faster loop-closure detection technique [113]. In this work, they heavily rely on the efficiency of the tree structures for feature matching to achieve a real-time processing. However, in real applications where a vehicle is navigating for long distances, a linear increase of the number of features to deal with, while new images are acquired, can be a serious problem to the navigation algorithm. Therefore, relying on just the KD-tree data structure for features matching makes it really harder to ensure a real time processing, especially for faster moving vehicles.

The diversity of methods introduced in the literature illustrates how difficult the loop-closure detection problem and global localisation are. Having said that, it seems that detecting the loop-closure has been addressed in different ways and there is no global solution applicable in all cases.

In this chapter, new appearance-based techniques for loop-closure problem are presented. These solutions extend the previously presented methods by particularly addressing the perceptual aliasing problem with the Gaussian mixtures model (GMM) in combination with the KD-tree data structure. The proposed technique improves the computational time, where a significant reduction in the search time is achieved. Furthermore, to ensure the efficiency in different environments, two feature spaces are used to describe the images. These feature spaces are the SIFT features and the local colour histograms. In addition, an entirely GMM-based technique for loop-closure detection is presented first in this chapter.

164 Chapter

7.

Real-Time Loop-Closure Detection

In document Título de la cubierta: Memoria de los 40 años del Centro Cultural de España Juan Salazar (página 159-161)