• No se han encontrado resultados

5. Análisis de implicación.

5.1 Análisis de presencia e implicación.

Vision systems for mobile robots bring together the two very challenging prob- lem domains of image processing and autonomous mobile systems. E.g. most of the state of the art computer vision algorithms are computationally rather expensive, even when efficiently implemented. So a very careful assessment of their individual applicability is necessary. This on the other hand often dis- courages experts in computer vision to work on robot vision, as most of the advanced algorithms seem to be ruled out per se by timing constraints. In consequence solutions in robot vision are often: (1) hard coded quick hacks that try to enable micro optimizations by doing multiple operations at once, (2) heavily model-based or heuristic, exploiting special circumstances with lit- tle validity despite the one scenario they are targeted for, (3) in consequence hardly maintainable and little flexible.

So to mediate between the partially contradictory requirements of advanced vision processing in a real-time constraint environment, proper conceptual sup- port from the vision processing architecture is necessary, to encapsulates the vision application within this application domain. In order to better under- stand the different requirements that need to be supported, we first take a brief look at the two problem domains: computer vision and robot vision.

66 CHAPTER 6. EXTENSIBLE FRAMEWORKS Computer Vision and Image Understanding

The basic concept of computer vision is the application of operators to image data such as the conversion of a color image into gray-scale, or filtering the image for edges. Often operations transform more then one input image into a new output image as e.g. a Canny edge detector [19] usually needs two images, which are convolved using a horizontal respectively a vertical Sobel operator. Other operators may use the same image result from different time stamps as for example a operator using two timely consecutive images to detect the optical flow [52].

More sophisticated operations do not only cover filter-like processing steps, but all possible input-output mappings in general. So the result of a computer vision operation doesn’t have to be again an image but can be every possible data as e.g. a color histogram, a similarity value between two images or any other image statistic measure.

Sequences of such image operators reveal features within the image that can be used to identify regions of interest (ROIs). So subsequent image operations don’t need to be applied on the whole image but can be restricted only to relevant subwindows. This is done either to speed up the processing loop or to be sure not to tamper the result with unwanted image structures from outside the region. Further operators derive image features from these ROIs that enable a reliable object recognition. Various feedback loops such as integration over time [59] can speed up processing and improve classification results [80].

Robot Vision

Performing the above sketched operations on an autonomous mobile robot on the video image stream of the robots camera(s) within a medium sized robotics application adds a whole bunch of additional challenges to the problem set.

Efficient organization of control and data flow. Video image processing on a mobile robot is usually sensor triggered and is started as soon as a new image is available to the robot as an image taken one second before does not necessarily resemble anymore the actual situation in a dynamic environment. At the same time, the performed processing needs to be demand driven, to not misspend the available computational resources.

Parallel and asynchronous evaluation. More and more robots are equipped with multiple cameras for stereo vision, or to extend their field of view. Multiple image sources, but also dual CPU boards as well as the upcom- ing hyper-threading and multi-core processor technologies call for asynchronous, parallel processing capabilities. Multiple image sources allow for interleaving processing, and the true parallelism of the advanced hardware features stay un- used by single-threaded applications. The actual challenge however, lies in the

6.1. VIDEO FILTER FRAMEWORK 67 proper synchronization between different image processing tasks for the fusion of their results.

Timeliness and resource management. Due to the computational cost of most image operations, and the fact that the CPU is also used by other con- current tasks of the system, the available processing power will usually not be enough, to perform all possible evaluations on every single image. In order to still meet the timeliness constraints of the reactive systems, different percep- tual tasks (e.g. obstacle avoidance and face recognition) need to be properly prioritized. E.g. the data for obstacle avoidance needs to be evaluated as of- ten as possible, while the face recognition for greeting known pedestrians can be evaluated whenever some CPU cycles are left. Additionally, not all im- age processing tasks have to be performed over the whole time. The robots’ situatedness enforces the use of special vision routines for different purposes.

Communication of results. Last but not least, images as well as extracted symbolic information of objects need to be accessible to the other modules of the robot software. Interfacing is an issue in the context of image processing on autonomous robots, as the information requested by client modules usu- ally determines which information needs to be extracted from the image in a given situation. Also, the communication of whole images to client applications consumes large amounts of communication bandwidth and requires therefore a careful design.

Related Work

Common vision related architectures and publications can be roughly divided into three types: subroutine libraries, command languages and visual program- ming languages.

Subroutine libraries are the most commonly used ones. They mostly concen- trate on the efficient implementation of image operators. Therefore they consist of normal functions, each responsible for a different image processing operation. Classical examples are e.g. the well known SPIDER system [129] or NAG’s IPAL package [20] written in C or Fortran. More recent approaches are e.g. LTI-Lib [122] or VXL [123], which both are open-source, written in C++ and consist of a wide range of operations, ranging from image processing methods, visual- ization tools and I/O functions. The commercial Intel Performance Primitives (IPP) [56] are an example for highly (MMX and SSE) optimized processing routines with a normal C-API. What they all have in common is their lack of support for some kind of flow control support. Yet another collection of mutex or semaphore helper classes and some kind of thread abstraction is the maximum of assistance in this respect.

More advanced command languages for image processing are mostly imple- mented as scriptable command line tools that a developer can use to direct

68 CHAPTER 6. EXTENSIBLE FRAMEWORKS the vision package. In case of the imlib3d package [121], the image processing operators can be called from the Unix command line, the CVIPtools [132] are delivered with an extended tcl command language. So both packages have the ability to include conditional and looping facilities. But again the programmer not only has a flexible way of complete control over the system, but also the full liability over the processing cycle. Additionally the scripting approach makes it hard to meet the required performance constraints of this application domain. The most sophisticated solutions are the visual programming languages. They allow the user to connect a flow-chart of the intended processing pipeline us- ing the mouse. They combine the expressiveness and the flexibility of both above groups. Often they contain not only a real mass of image processing functions and statistical tools, but also a complete integrated development en- vironment. Most of these systems are commercial products. One of the most advanced one is VisiQuest (formerly known as Khoros/Cantata). According to their web site, it supports distributed computing capabilities for deploying applications across a heterogeneous network, data transport abstractions (file, mmap, stream, shared memory) for efficient data movement and some basic utilities for memory allocation and data structure I/O.

But as of today, there is no concise design for image processing available that combines all of our above described features like parallel and on demand pro- cessing of parts of the filter tree in a flexible yet powerful way, making the system suitable for a wider range of image processing tasks, like active vision problems on autonomous mobile robots.

Documento similar