Nonuniform Video Coding by Means of Multifoveal Geometries
J.A. Rodrı´guez, C. Urdiales, A. Bandera, F. Sandoval
Departmento Tecnologı´a Electro´nica, ETSI Telecomunicacio´n, Universidad de Ma´laga, Campus de Teatinos,
29071 Ma´laga, Spain. E-mail (J.A.R.): [email protected]
ABSTRACT: This paper presents a control mechanism for video transmission that relies on transmitting nonuniform resolution images depending on the delay of the communication channel. These images are built in an active way to keep the areas of interest of the image at the highest resolution available. In order to shift the areas of high resolution over the image and to achieve a data structure that is easy to process by using conventional algorithms, a shifted foveal mul-tiresolution geometry of adaptive size is used. If delays are too high, the resolution areas of the image can be transmitted at different rates. A functional system has been developed for corridor surveillance with static cameras. Tests with real video images have proven that the method allows an almost constant rate of images per second as long as the channel is not collapsed. A new method for determining the areas of interest is also proposed, based on hierarchical object track-ing by means of adaptive stabilization of pyramidal structures.© 2002 John Wiley & Sons, Inc. Int J Imaging Syst Technol, 12, 27–34, 2002; DOI 10.1002/ima.10005
I. INTRODUCTION
Some computer vision-based applications like surveillance, track-ing, or traffic supervision must present simultaneously a wide field of view and a suitable image resolution to ensure proper functioning. Transmission of these video sequences requires a large bandwidth because they yield an enormous data load. To avoid this problem, temporal video compression techniques can be used. Conventional image sequence coding (e.g., MPEG-2; Haskell et al., 1997) has relied on a number of powerful concepts. An original image is converted into a digital format by sampling in space and time and by quantizing in brightness or color. For example, generated packets may include pairs of adjacent pixels, groups of pixels within a geometrical data structure (e.g., a square image block), or a linear reversible transform of these pixels. The compression performance of these schemes saturates quickly, due to the variability over space and time of the statistical properties of natural images (Aramwith and Sun, 2000). Interesting, real-time adaptive sampling is imprac-tical, due to computational limitations. Besides, data independent structures such as square data blocks (used in MPEG-1 and MPEG-2) cannot describe nonstationarities and hence cannot serve as efficient data structures for image sequences.
Improved compression techniques represent visual data in terms of regions, possibly corresponding to objects. In this way, active
vision techniques (Urdiales et al., 2000) may provide an efficient solution because scenes usually present only some areas of interest. If noninteresting areas are represented at a low resolution and only critical areas present a high level of detail, the data volume of the resulting image is reduced considerably. Thus, video sequence size can be adapted to the network state by reducing the resolution of some image regions.
In video sequences, the most important areas of the frames are usually those that change often. Therefore, if those regions are detected in the sequence, they can be represented at a high resolution whereas static areas can be represented at a low resolution. This operation allows a severe data volume reduction when transmitting the video signal. Areas that present few changes from frame to frame do not need to be transmitted as often as moving areas because they will be very similar during a few frames. Thus, if delays are unacceptable even when transmitting multiresolution images, re-gions of different resolution may be sent at different rates, so that areas of interest are transmitted more often than the rest. In the receptor, the total multiresolution image is composed with previ-ously received information.
This paper presents a system where video flow between two personal computers has been controlled by the aforementioned method. The complete system is described in Section II. Section III focuses on the motion detection method that actively selects the areas of interest of the image in an unsupervised way. Section IV briefly describes the multiresolution geometry that has been used to compress the sequence data volume. It also presents the mechanisms required to transmit the regions of the image at different speeds depending on their resolution when delays are too high. Finally, Section V presents tests and results and conclusions are given in Section VI.
II. SYSTEM DESCRIPTION
Several visually guided applications require a wide field of view, high resolution, and minimal processing times. The main problem of traditional vision systems is that uniform resolution images yield a too large data volume to be transmitted and processed in real time. Nonuniform resolution images provide a lower data volume by representing uniquely the areas of interest of the image at a high resolution. Obviously, the position of those areas is not known a priori and it may change in time depending on the nature of the ongoing application. Therefore, it is necessary to use vision in an active way, so that information acquisition and data extraction processes are performed selectively depending on the task to be accomplished. The proposed system consists mainly of four modules Correspondence to:J.A. Rodriguez
Grant sponsor: Comisio´n Interministerial de Ciencia y Tecnologı´a (CICYT); Grant number: TIC098-0562.
(Fig. 1). Initially, the video stream is segmented using a new hierarchical spatiotemporal method. As a result of such a process, the regions present in the scene and their motion estimations are obtained. The second module uses this information in order to select the most relevant areas, according to their area and velocity, as regions of interest (ROIs). The position of these regions is used to construct a multiresolution image that preserves these areas at a high resolution.
The coding module constructs a transmission packet including the different levels of each multiresolution image to be sent. The channel delay determines the number of resolution regions sur-rounding the ROIs to be transmitted. If the channel bandwidth available is high enough, the entire structure is sent over the channel. If it is not, consecutive resolution regions of the structure are removed from the packet to be sent until its size is bounded enough to achieve the desired frame rate of video transmission.
Finally, once the transmission packet has been received, the multiresolution image is rebuilt by the decoding module. Because the received packet can be incomplete, the reconstructed multireso-lution image could include information received in previous packets.
III. TARGET DETECTION
In video transmitting applications, the most interesting part of the image is that which changes more often. Particularly, when dealing with static cameras, moving objects should be studied at a high resolution, while the background may remain at a low resolution. In order to estimate the areas of interest from the scene, a spatiotem-poral segmentation is achieved. As a result of this process, the different regions of the image and their displacement estimations are available. Only the areas that present no null displacement vectors are selected to be presented at the highest resolution.
A. Structure Generation. Pyramids are being used currently as a tool for motion estimation and spatiotemporal segmentation, be-cause of their multiresolution nature (Luthon et al., 1999; Mahzoun et al., 1999). To track a moving region through a sequence of images, a pyramid must be built onto each of them. Pyramids were
proposed originally in Tanimoto and Pavlidis (1975) as a tool for low-pass filtering. The levels of these structures present progres-sively lower resolution and one fourth of the nodes of the level immediately below. To build a pyramid over a given frame, the following steps are required:
1. Letl⫽0level 0 being the original image.
2. For each set of 2⫻2 nodes (sons) at levell, generate a single node (parent) at level l⫹1whose gray level is equal to the average of the gray level of its sons.
3. Create a link from each son to its parent.
4. Letl⫽l⫹1. Repeat step 2 until the pyramid is built.
Given a levelL, it can be observed that each of its nodes is linked to a square area of nonhomogeneous cells at the base level present-ing4L
nodes.
B. Structure Stabilization. Stabilization of a unique pyramid was proposed originally by Burt et al. (1981) as a segmentation method to enhance the speed of conventional nonhierarchical seg-mentation techniques. The process consists of rearranging the link structure of the pyramid so that a node gets linked to a homogeneous irregular area of nodes instead of to a square nonhomogeneous one. Therefore, any level of the structure provides a segmentation of the image into as many homogeneous regions as nodes it presents. Such a segmentation is correct as long as the number of classes is known a priori. If this information is not available, the solution consists of initially choosing a relatively high number of classes. A later merg-ing procedure can group those classes that stand for contiguous areas, which present a similar gray level. If the initial number of classes to merge is too low, undesired fusions may have already happened and classes might be no longer homogeneous. Hence, a working level that presents a greater number of classes than the number of regions appearing at the scene is highly recommended.
A first approach to tracking might be to independently stabilize consecutive pyramids to segment their frames into regions. Then, a matching procedure might pair them to track their position from one
frame to the next. However, matching procedures are often slow and they may fail if regions are deformable, if illumination changes are present, or if objects are occluded. In fact, complex real images are hardly segmented into the same regions even if capture conditions change only slightly. Therefore, establishing such a correspondence is always difficult.
Our proposal is to modify the algorithm so that two consecutive images may be segmented at the same time. In this way, regions appearing at both frames are the same even for changing capture conditions because both frames have equal influence on the global process. Thus, once two basic pyramidst-1andtare built over two consecutive frames captured at instantst-1andt, the goal is that a node at pyramidtgets linked to an irregular homogeneous region at the base level of pyramidt, but also to an irregular homogeneous region at the base of pyramidt-1.
To stabilize two consecutive pyramidst-1andtin a combined way, we propose the following steps:
1. Letl⫽0.
2. This step concerns pyramids at framet-1andt:
(a) For each noden(i,j,l,t), find the most similar parent at levell⫹1in a 3⫻3 vicinity above the node at pyramidtand establish a link between them.
(b) For each noden(i,j,l,t-1), find the most similar parent at levell⫹1in a 3⫻3 vicinity above noden(i⫹⌬i, j⫹⌬j,l,t)and establish a link between them.
⌬i,⌬jis the displacement of the node between instant t-2 and instant t-1. It is equal to the difference between the position of the centroid of the region at the base linked to such node fromt-2tot-1.
(c) If no link changes, levellis stabilized and the algorithm proceeds to step 4. Otherwise, it continues to step 3. 3. Regenerate levell⫹1of pyramidt. The gray value of each
father cell is recomputed using the average of the sons linked to it, at levell, in pyramidst-1andt. It must be noted that, in this case, a parent may yield from zero to 36 sons at each pyramid. As the gray level of parent cells has changed, go to step 2. This iterative process is limited to a maximum of iterations previously determined for each level.
4. Let l⫽l⫹1. Return to step 2 until the whole structure is stabilized.
After steps 1– 4 are accomplished, each of the bases of pyramids
t-1andtis segmented into as many regions as the number of cells of the working level. This process requires pyramidt-1to be already stabilized with pyramidt-2. Iftis equal to zero, then the pyramid must be stabilized on its own by using the classic hierarchical segmentation procedure (Burt et al., 1981).
C. Target Selection. When the process is accomplished, any region at framet-1may be tracked at frametby following the link structure, because they are linked to the same node of pyramidt
(Fig. 2). Besides, pyramid t-1is also linked to pyramid t-2. The regions appearing may be tracked through the whole scene and the displacement of their nodes is equal to the displacement of their centroids from one frame to the next. The regions presenting non-zero displacements are candidates to be the ones to place a fovea onto them, i.e., to be considered as ROIs.
IV. MULTIRESOLUTION COMPRESSION SCHEME Although nonuniform resolution images may yield nonstructured profiles, foveal geometries have been developed in order to process
these images efficiently. Foveal images emulate the retinotopology of some biological vision systems, which present a central region yielding a maximum density of photoreceptors and a peripheral region where the density decreases according to the distance to the central region. In this work, foveal structures with a cartesian-exponential lattice has been choosen because they allow an easy VLSI implementation of the sensor and because most multiresolu-tion algorithms available can be easily adapted to work with these images (Coslado et al., 1999).
A. Foveal Geometries in Active Vision. The cartesian-expo-nential lattices were proposed initially by Bandera and Scott (1989). These geometries consist of a symmetric grid that presents a cen-tered region of high resolution known as the fovea surrounded by a set of rings with decreasing resolutions. Their symmetry and discrete power-of-two acuities support hierarchical data structures and pro-cessing techniques that can be implemented efficiently with current technologies and that are rather complex in log-polar topologies (Capurro et al., 1997). The main drawback of cartesian-exponential images is that there is no continuity in the resolution profile. How-ever, most applications work correctly despite this fact. In order to define a cartesian-exponential geometry, two parameters are re-quired:m, the number of uniform resolution rings around the fovea, and d, the subdivision factor or number of subrings inside each resolution ring.
Figure 3(a) shows a cartesian-exponential foveal geometry with
m⫽3 andd⫽8. It is important to note that there are efficient data structures to store multiresolution images so that conventional image processing algorithms do not need to be altered significantly to work over them (Camacho et al., 1997, 1998).
The main problem when working with foveal images is that in order to place the area of maximum resolution over an interest area, one or several camera movements, possibly including pan, tilt, and zooming, are required. The required hardware for that operation may not be available in all cameras. Also, mechanical movements are always slow and inaccurate when compared with the operation cycles of electronic devices. An additional advantage of cartesian-exponential geometries is that the areas of high resolution of the image can be reconfigured easily to change its position and size. Thus, instead of moving the camera, only the fovea is relocated efficiently and quickly within the field of vision. New geometries derived from the classic centered foveal one (Fig. 3a) have been developed for this purpose (Arrebola, 1998).
Figure 3(b) presents a basic shifted fovea multiresolution geom-etry (BSFMG). The fovea may be repositioned to different regions
of the field of view in order to cover the areas of interest despite their location with no need of mechanical movements. Two additional parameters are required to define the geometry:sh, the horizontal displacement of each resolution ringi with respect to a centered fovea, andsv, the vertical displacement of each resolution ringiwith respect to a centered fovea.
In the BSFMG shown in Figure 3(b),mis equal to 3 andshand
sv are equal to 2 and 4, respectively, being both parameters ex-pressed in terms of resolution cells (rexels). BSFMGs present the same number of cells, the same field of view, and the same com-pression factor than centered cartesian-exponential geometries. However, the fovea cannot be placed at all positions of the field of view because the maximum value of sh and sv is equal to the subdivision factordof the geometry. Thus, there are only(2d⫹1)2 possible fixations allowed. Because resolution changes by a power of 2, it can be deduced easily that the minimum displacement of the fovea in pixels is equal to2(2m-1).
Figure 3(c) presents an extended shifted fovea multiresolution geometry (ESFMG). In this geometry, the fovea can be placed at any position of the field of view by allowing different displacements between successive resolution rings. The final shifting of the fovea is determined by two arraysSHandSV, where each pair of elements
SHkandSVkshows the relative shifting of ringkregarding ringk⫹1. The main advantages of this geometry when compared with the SFMG include the following:
1. The fovea can be positioned at any point of the field of view whose coordinates are a multiple of 2. Therefore, the mini-mum displacement of the fovea does not depend onmand the maximum positioning error is equal to 1 pixel.
2. The number of possible fixations allowed is now equal to
((W-4d)/2)2
,Wbeing the width of the field of view in pixels. 3. All regions of the image can now be examined at a high
resolution.
In SFMGs, the fovea is always square and its dimension has great influence in the resulting geometry. Thus, irregular objects are not covered efficiently by the fovea and either part of them is captured at a low resolution or part of the high resolution area is uninteresting. If the object in the image is actually larger than the fovea, the camera needs to be moved backward to enclose it completely into the fovea. Figure 3(d) shows a shifted fovea multiresolution geometry of adaptive size (SFMGAS), whose main advantage is that the dimen-sions of the fovea are no longer constrained. In this case, five parameters are required to define the structure:m, the number of resolution rings;LdandRd, the left and right displacement factors or number of sensor elements on each side of each ring; andTdand
Bd, the top and bottom displacement factors or number of sensor elements above and below each ring.
In the SFMGAS in Figure 3(d),mis equal to 3, andLd,Rd,Td, and Bd are equal to 2, 10, 2, and 12, respectively. The main advantage of this geometry is that any object can be captured efficiently at a maximum resolution despite its dimensions and distance to the camera. Now, the compression factor depends basi-cally on the size of the fovea andm.
Because the position and size of the ROIs in video sequences are not known a priori, SFMGAS are the most suitable for the problem at hand. Nevertheless, the presence of more than one target in a scene simultaneously is usual, and, therefore, the geometry must be able to handle the presence of several foveae at once.
In order to keep at a maximum resolution more than one area of interest in a scene, multifovea structures should be introduced (Ca-macho et al., 1998). Figures 4(a)-(c) show three configurations of SFMGAS with their corresponding foveae covering possible areas of interest in the scene. As can be appreciated in Figure 4(d), the multifoveal geometry consists basically of the superposition of the three single-fovea structures separately. This geometry is defined by the set of parameters that defines each individual SFMGAS.
Figure 3. Cartesian-exponential geometries. (a) Classic topology. (b) BSFMG. (c) ESFMG. (d) SFMGAS.
B. Active Data Reduction by Target Selection. In order to control appropriately the video flow in a channel presenting a variable delay, applying a reduction mechanism to the information to be sent is highly desirable. Using foveal images implies a first instance reduction in data flow when transmitting video sequences by focusing attention on interest areas. An image of 256⫻ 256 pixels (65,536 bytes in a gray scale scheme) can be reduced to a total of 3,328 rexels if a foveal version of such an image with a centered fovea of 32 ⫻ 32 and three rings is constructed. Thus, a 95% reduction of image information is achieved. The compression ratio (CR) of a foveal cartesian-exponential geometry is equal to4m⫹1
/ (4⫹3m)and it only depends onm(Camacho et al., 1997). However, no significant data volume reduction is achieved with more than six rings.
In a wide class of environments, especially those based on static cameras, it can be assumed that only the regions surrounding mobile objects present significant changes from frame to frame. According to the previously presented multiresolution image construction method, mobile regions are represented at a high resolution and a series of rings surrounding it in a progressive resolution reduction scheme. Thus, the information to transmit through the channel can
be ordered in a priority scheme, giving maximum priority to the foveae and decreasing priorities to the progressively decreasing rings surrounding them.
Because the size and the shape of the regions of the multireso-lution image depend on the size, shape, and position of the moving objects in the scene and because these regions change continuously, it is not possible to apply video compression schemes based on macroblocks (e.g. MPEG-2 or H263) to the foveas or to the sur-rounding rings. The proposed scheme achieves data reduction as follows: (1) the proposed multiresolution topology implies a spatial reduction and (2) temporal reduction is achieved by applying a different frame rate to the successively decreasing resolution parts of the multifoveal image. Although nongeometrical compression tech-niques could have been used to obtain a further data reduction, no further compression techniques have been applied in order to eval-uate the efficiency of the proposed data reduction mechanism.
Figure 5 illustrates the data reduction obtained with nonuniform resolution images when compared with uniform images. It presents the transmission features when using uniform resolution images of 256⫻256 pixels (65,536 bytes), centered foveal images yielding 32⫻ 32 pixels, and three rings (3,328 bytes) and portions of the foveal images: fovea and two rings (2,560 bytes), fovea and a ring (1,792 bytes), and fovea alone (1,024 bytes). Thus, if the delay is equal to 2.5 ms, the maximum image per second rate is lower than 1 when using uniform resolution images but more than 12 images per second can be achieved if sending complete multiresolution images with three rings. The delay is defined as the time that a 50-byte sized packet requires to travel through the channel. There-fore, the system will transmit the whole foveal image if the channel
Figure 5. Maximum frame rate vs. delay for different transmission schemes.
Figure 6. Application of the extended fovea reconstruction algorithm. (a) Image at instantt-1. (b) Image at instantt. (c) Received multiresolution image without extended fovea. (d) Received multiresolution image with extended fovea.
bandwidth can hold it. As the transmitting speed slows down, a control algorithm can determine which resolution rings to transmit and this information is composed in the receiver with previously received data in order to complete the whole field of view.
Once motion has been estimated correctly in a frame, this process works properly because untransmitted areas have not changed from the previous frame and, therefore, the composed image in reception is very similar to what it should be. However, because each fovea keeps moving from frame to frame, there is an area that changes drastically and may not be transmitted in the worst case: the region where the previous fovea was located. This problem is illustrated in Figure 6, where only the fovea is transmitted. Figure 6(a) presents
the image received at instant t-1, Figure 6(b) presents the image captured at instantt, and Figure 6(c) presents the image composed in reception at instanttafter receiving the fovea in Figure 6(b). The area uncovered by the fovea presents false information because no data about the actualized state of such an area have been received. In order to solve this problem, the fovea can be extended easily to include the area that has been uncovered because of the object movement, in function of the speed vector that specifies how much it has moved since the previous frame. Therefore, if the displace-ment of the centroid of the area covered by the bounding-box is
(Dx,Dy), the new fovea is extended horizontally an amount of Dx
pixels. Similarly, the height of such a fovea is incremented an
Figure 8. (a) Multiresolution image for the proposed coding algorithm. (b) Detailed target region. (c) Detailed target region using MPEG-2.
Figure 9. Data sequence from a typical execution. (a) Frame 242. (b) Received multiresolution image 242. (c) Frame 266. (d) Received multiresolution image 266. (e) Size of the regions of the multiresolution images during the whole sequence.
amount ofDypixels. Figure 6(d) shows how this extension allows actualization of the whole area of changes in reception in order to compose correctly the new frame at reception.
C. Final Packet Structure.
Figure 7 shows a possible multifovea configuration, which presents two foveas. The structure of the transmission packet is also shown. It consists of a number of subpackets, one for each fovea. Each subpacket includes a set of parameters defining the structure of each fovea configuration (Ld, Rd, Td, Bd, m, Sx, Sy), as well as the number of rings sent for each one (mode).
V. TESTS AND RESULTS
In order to prove its validity, the system has been tested in a variety of environments. Figure 8(a) shows the foveal image for a video sequence of a traveling toy car. The target is selected correctly and the fovea surrounds it accordingly. In order to show one of the advantages of our compression scheme, the original sequence has been compressed using MPEG-2 with a constant bit rate (CBR) of
0.4 Mbps, which is needed to achieve the same data recuction factor as using the proposed method. Figure 8(b) shows a detailed version of the target selected using the proposed compression scheme. The MPEG-2 version of the region shown in Figure 8(c) presents arti-facts due to the block-oriented nature of the mentioned compression method, whereas the proposed algorithm preserves its original res-olution. Our system is more suitable for applications that require a given CR plus the preservation of the original version of regions from the scene.
Figure 9 shows a typical video surveillance application with a single mobile at a time. Figures 9(a) and 9(c) show two frames from one of the studied sequences. The corresponding received multireso-lution images are shown in Figures 9(b) and 9(d), respectively. In these images, the person is tracked correctly through the image and, therefore, the multiresolution image is generated efficiently.
The size of the regions that conforms the multiresolution image at each frame of the video sequence is plotted in Figure 9(e). The sequence consists of 358 frames. The mobile appears around frame number 240. Before this instant, the region sizes are almost constant, except for small changes due to false detections. Similarly, the existence of local minima of the fovea size during the tracking of the mobile object (frames 240 –325) is seen. Finally, the size of the regions increases when a mobile is present at the image, except for the third ring whose size is reduced because part of it is covered by the increased size of the inner regions.
Figure 10 shows how the system behaves for the sequence presented in Figure 9 under the externally imposed delay conditions presented in Figure 10(a). There are three delay windows of 0.5, 1, 2, and 1.6 ms. In order to achieve a constant image rate per second of 10 (capture rate), the transmitted data volume must change through the sequence. To analyze this data volume through the different delay windows, Figure 10(b) shows which part of the multiresolution image is sent at each frame. Usually, the whole image is sent, especially if there are no mobile objects in the image or if foveas are small. The highest delay window has been forced purposefully when mobiles appear more frequently in the sequence. When a mobile is detected, only if delay conditions impose it, some resolution rings are not sent, as can be appreciated during frames 230 –240. When mobiles disappear or delay conditions are im-proved, the whole multiresolution image is sent again.
Figure 10. System application. (a) Forced delay of the channel. (b) Transmission scheme. (c) Data flow to be transmitted.
Finally, Figures 11(a)-(c) show three cases presenting more than one target at once. The images were captured in different scenarios and no special constraints have been applied. The targets are always received at a high resolution and they can be processed in their full detailed version at reception.
VI. CONCLUSIONS AND FUTURE WORK
This paper has proposed a simple flow control method to keep a bounded delay when transmitting video sequences through a shared medium. The system relies on actively decreasing the resolution of the static areas of the image while moving regions always present high resolution. The multiresolution image presents shifted fovea geometry of adaptive size, which allows easy processing on the receiver end by using conventional algorithms. Besides, if delays are too high, the resolution rings of the image may be transmitted at different rates. The method has been implemented for corridor surveillance and it has proven to be successful for complex real sequences.
Video compression standards have disadvantages. First, they are not lossless methods, so neither the original image nor parts of it are preserved at a high resolution. Second, they are based on macrob-locks, which have predefined size and shape not according to the shape and the size of objects present at the scene. Third, they are computationally intensive.
The main advantages of the system proposed are that it preserves the original resolution of the targets present at the scene, it holds a constant frame rate even when the network presents variable band-width, and it can handle a variable number of targets.
Finally, it may be interesting to enable different priority channels for the different resolution areas of the image, so that each one may present a distinctive control mechanism according to available re-sources. Further compressing techniques applied to the different parts of the multiresolution image might be tested.
REFERENCES
S. Aramvith and M.T. Sun, “MPEG-1 and MPEG-2 video standards,” Handbook of image and video processing, A. Bobik (Editor), Academic Press, San Diego, 2000, pp. 597– 610.
F. Arrebola, Sistema de visio´n basado en ima´genes multirresolucio´n de fo´vea desplazable, PhD thesis, Dpto. Tecnologı´a Electro´nica, Universidad de Ma´laga, Ma´laga-Spain, 1998.
C. Bandera and P. Scott, Foveal machine vision systems, Proc IEEE Int Conf on Systems, Man and Cybernetics, Cambridge, 1989, pp. 596 –599. P. Burt, T. Hong, and A. Rosenfeld, Segmentation and estimation on image region properties through cooperative hierarchical computation, IEEE Trans Systems Man Cybernetics 11 (1981), 802– 809.
P. Camacho, F. Arrebola, and F. Sandoval, Adaptive fovea structures for space-variant sensors, Proc IEEE Int Conf on Image Analysis and Process-ing, Florence, Italy, 1997, pp. 422– 429.
P. Camacho, F. Arrebola, and F. Sandoval, Multiresolution sensors with adaptive structure, Proc 24th Annual Conf of the IEEE Industrial Electronics Society, Aachen, Germany, 1998, pp. 1230 –1235.
C. Capurro, F. Panerai, and G. Sandini, Dynamic vergence using log-polar images, Int J Comput Vision 24 (1997), 79 –94.
F. Coslado, P. Camacho, M. Gonza´lez, F. Arrebola, and F. Sandoval, VLSI implementation of a foveal polygon segmentation algorithm, Proc 10th Int Conf on Image Analysis and Processing, Venice, Italy, 1999, pp. 185–190. B.G. Haskell, A. Puri, and A.N. Netravali, Digital video: An introduction to MPEG-2, Chapman & Hall, New York, 1997.
F. Luthon, A. Caplier, and M. Lie´vin, Spatio-temporal MRF approach to video segmentation: Application to motion detection and lip segmentation, Signal Proc 76 (1999), 61– 80.
M.R. Mahzoun, J. Kim, S. Sauazaki, K. Okazaki, and S. Tamura, A scaled multigrid optical flow algorithm based on the least RMS error between real and estimated second images, Pattern Recog 32 (1999), 657– 670. S. Tanimoto and T. Pavlidis, A hierarchical data structure for picture pro-cessing, Comput Graphics Image Proc 4 (1975), 104 –119.
C. Urdiales, J.A. Rodrı´guez, A. Bandera, and F. Sandoval, Video flow active control by means of adaptive shifted foveal geometries, Proc SPIE, 4197 (2000), 229 –240.