CAPÍTULO 4 METODOLOGÍA DE LA INVESTIGACIÓN
4.2 Trabajo de pre campo o gabinete
4.2.4. Aspectos sociales de la cuenca hidrográfica Tambo
In Chapter 5, we have presented a visual SLAM system which integrates feature mea- surements from multiple cameras. Multiple cameras looking in different directions can provide more reliable image features for pose tracking, compared with one monocular camera. Therefore, we achieve more robust pose tracking of an MAV being more re- sistant to tracking failures. This achievement is especially useful when MAVs fly in complex environments. We keep the complete map during explorations of the visual SLAM system. Thus the system can utilize all previous visible measurements for lo- calization and mapping, and provide more accurate pose estimates and mapping results when working in small-scale environments. However, this SLAM system does not scale well for operations in large-scale environments, in which it will hardly be able to build a consistent map when flying around with loops, i.e. re-visiting certain places during an exploration.
(a) (b)
Figure 6.1: (a) A local map built by our dual-camera visual odometry during an explo- ration, with nL = 4 (see Sec. 6.4.1). Map points from the forward-looking camera are
marked in red color, and those from the downward-looking camera in blue. The trajec- tory of the forward-looking camera is plotted in green. (b) A scene of the actual lab environment where this experiment was performed, in a similar perspective.
In this chapter, we modify our previous multi-camera visual SLAM system to operate as a robust visual odometry with constant-time cost during large-scale explorations. Our final implementation utilizes two cameras pointing forward and downward, respectively, as we have done in Chapter 5. An example map built by the visual odometry in our dual- camera setting is shown in Fig. 6.1a. Moreover, we propose an efficient visual SLAM back-end for loop-closure detection and correcting pose drift that is inherent in the visual odometry by using pose-graph optimization (PGO).
The back-end of our visual SLAM system maintains a global map organized in keyframes, each of which is associated with some map points represented using positions relative to it. In the global map, we keep map points in a relative representation, the keyframes in absolute representation. Thus, map points can be implicitly updated by PGO opera- tions, which optimize the poses of keyframes in the global map. Strasdat et al. (2011) proposed a double-window graph structure for optimizing the global map. Bundle adjust- ment within a small inner window and pose-graph optimization within a larger window are integrated within one optimization problem. In our work, we decouple bundle adjust- ment and PGO into a visual odometry front-end and a separate back-end, so that accurate pose tracking can be achieved in constant time, without losing the benefit of PGO which
6.2 Related Work
can correct pose drift of the visual odometry in long-term explorations and thus ensures a consistent global map.
The remainder of this chapter is organized as follows. We review the related work on map representation and loop closure detection in visual SLAM in Sec. 6.2. Then we present our visual SLAM back-end for managing the global map, loop-closure detection and pose-graph optimization, in Sec. 6.3. We provide further details of the implemen- tation of our SLAM system in Sec. 6.4. In Sec. 6.5, we validate our SLAM system by using it onboard of an MAV. Finally, in the last section, we conclude the work of this chapter and discuss possible future work.
6.2 Related Work
In visual SLAM, different ways to represent the environment map have been proposed. The map of the MonoSLAM system (Davison et al., 2007) adopts a probabilistic feature- based map. This map consists of the current estimates of the camera state and all feature points with uncertainty measurements, which are updated by the Extended Kalman Filter (EKF). In PTAM, the map consists of a collection of map points and keyframes. Re- cently, the work on visual SLAM systems using keyframe-based methods has proposed to organize the map with relative representation to improve the efficiency. In Mei et al. (2011), robot positions and the map are represented in a continuous relative represen- tation (CRR) framework, which allows relative bundle adjustment for map refinement and real-time loop closure. The work in Lim et al. (2012) presents a hybrid metric- topological map for large scale online environmental mapping. The map is represented as a graph of the keyframes and the relative poses between keyframes. This work strictly enforces the metric property of the local sub-maps, which are optimized by using bundle adjustment and assumed to be rigid segments in the global segment optimization.
Recently, a number of efficient loop-closure detection methods have been proposed using visual vocabulary (Sivic and Zisserman, 2003). In those methods, local features are extracted to represent the appearance information of an image, and the loop-closure detection is solved using a place-recognition scheme. The visual vocabulary model treats an image as a bag of words (BOWs) much like a text document. In this model, each word corresponds to a region in the space of invariant feature descriptors (Cummins and Newman, 2011). A comparison of the visual-vocabulary-based approach to map-to- map (Clemente et al., 2007) and image-to-map (Williams et al., 2008) approaches for loop-closure detection in monocular SLAM can be found in the work of Williams et al. (2009).
In Cummins and Newman (2011), images are represented with a bag of words whose co-visibility probability is learned offline using a Cho-Liu tree. In Cadena et al. (2012), loop closures are detected based on the BOW method using SURF features. The loop closing verification is carried out using a method based on conditional random fields. The work in G´alvez-L´opez and Tard´os (2012) features a hierarchical BOW method. It
uses a vocabulary tree that discretizes a binary descriptor space. This vocabulary tree can efficiently speed up the retrieval of similar images and the verification of geometri- cal consistency for loop-closure detection. By using BRIEF descriptors (Calonder et al., 2010) which are binary and require very little time to be computed, a much faster con- version of the BOWs can be achieved than using SIFT or SURF descriptors. Rather than building the visual vocabulary based on a prior knowledge of the environment, Nicose- vici and Garcia (2012) proposed a method for loop-closure detection using visual vocab- ularies built online. To investigate the effect of quantity and quality of visual information to place recognition, Milford (2013) presented comprehensive experiments with different datasets using SeqSLAM (Milford and Wyeth, 2012).