Procesamiento - Arquitectura en capas Modularidad de la plataforma

2. Plataforma modular para redes de sensores inalámbricas

2.5. Arquitectura en capas Modularidad de la plataforma

2.5.1 Procesamiento

Algorithms, Performance

Keywords

visible, hidden, surface, determination

Supervisor: *UHJRU.ODMQãHN

1. INTRODUCTION

The application of visible surface detection algorithms is twofold. Firstly, the algorithm can be used to detect which objects are visible and also to approximate the percentage of the visible surface area for each surface. Secondly, the algorithm can be used for removing unnecessary surfaces that are occluded and by doing so we can speed up our applications, because the occluded surfaces could have otherwise gone through heavy processing with advanced effects. Here we must emphasize that this is not a culling method since those tend to operate on groups of surfaces and try to determine visibility of objects. Another property of the culling methods is that they tend to run almost exclusively on the central processing unit (CPU).

The method we propose can determine which surfaces of the object are visible and is more suited for processing on the GPU. We have implemented the method using the OpenGL API.

2. SURFACES

In our method the term surface represents an individual polygon. The simplest polygon is a triangle and since that is sufficient to construct any kind of complex geometry, we saw no need for usage of more complex polygons such as tetragons, pentagons etc. Thus we only use triangle surfaces in our geometric data, but the method also works for arbitrary polygons. It is though recommended to keep the geometry simple by using only triangles, since this reduces a shortcoming of our method, which is described in 4.3.

3. DESCRIPTION OF THE ALGORITHM

The basic idea behind the algorithm is very simple and consists of three steps:

- Render each triangle with a unique color, - download the rendered 2D image from the GPU,

- iterate though the 2D image and for every pixel map its color back to the triangle it represents and mark the triangle as visible.

3.1 Using the graphics processing unit (GPU)

Our method works by exploiting the color and stencil buffers of the GPU. These give us 39bits of data we can use per pixel. 32 bits come from the color buffer and 8 bits from the stencil buffer. The color buffer is comprised of four 8 bit channels, usually referred to as RGBA channels (red, green, blue and alpha). We reserved 1 bit for future use. The algorithm also exploits the Z- buffer, which automatically performs depth sorting of the geometry. All of these buffers are available on any modern GPU [1].

When the scene is to be rendered we have to attach the object and triangle information to every pixel. This can be accomplished in many ways. We chose a simple system where we first store the surface offset (0 ... N-1 where N is the number of surfaces for a given object) and then the object offset (0 ... M- 1 where M is the number of objects in our scene) into the color and stencil buffer. This is accomplished by converting the offsets into a RGBA color and storing any remaining data into the accompanying stencil value. Once we have the color and stencil value, we can render that surface and the GPU will, using the Z- buffer, determine if any part of the surface is visible.

After the rendering of the geometry is finished we need to download the data stored in the color and stencil buffer from the GPU memory to the system memory [2] where we process the

result as a 2D image. The color and stencil buffers together carry information from which we can calculate the index of the object DQGWKHLQGH[RIWKHREMHFW¶VVXUIDFHWRZKLFKWKHSL[HOEHORQJV Thus we can easily determine which surfaces are visible.

It is worth noting that these are the minimal steps required for implementing the algorithm and that presented method will also work with scenes that include dynamic objects. In the continuation we describe the techniques required to process scenes with richer geometry.

For each model in level For each polygon in model

Generate unique color information Add the color to an auxiliary color buffer End for

End for

Upload the data to the GPU

Pseudo code for initialization

Cull the scene using ordinary techniques

Draw onto a hidden screen using auxiliary color buffers Retrieve the drawn scene

For each pixel in scene Decode pixel

Mark the appropriate polygon as visible End for

Pseudo code for rendering

3.2 Utilizing the limited space

Since we have a fixed amount of available bits per pixel and two things we would like to store in the buffer, we have two ways as to how to divide it. Either we create a fixed division or we attempt to change the division when needed.

If we know the maximum number of objects and triangles per object that will appear in every future scene, beforehand, we can use fixed division. Since that scenario is highly unlikely, it is worth trying to find a better approach.

To use the limited space of the GPU buffers (39 bits) more efficiently we can check how many objects will be sent to the GPU just before rendering and how many surfaces each of these objects contains. Using this information we can dynamically set the division before each rendering. This is done using the following algorithm:

1. 'HWHUPLQH WKH QXPEHU 1¶ RI REMHFWV WKDW will be rendered.

2. 'HWHUPLQH ZKLFK RI WKH 1¶ REMHFWV KDV WKH PRVW surfaces - 0¶VXUIDFHV

3. Calculate the least amount of bits needed to store QXPEHUV1¶QELWVDQG0¶PELWV

4. Calculate the optimal division point using the numbers n and m.

Using this approach a scene that renders 70000 objects leaves 22 bits available for the surface offsets. The number 70000 requires at least 17 bits to be stored. With the available space left we can have up to

2

22(slightly over 4 million) surfaces per object.

3.3 Optimization

It is recommended that we do some basic geometry culling, such as frustum culling [3] and perhaps more advanced techniques such as occlusion culling [4], before executing the surface visibility detection algorithm. This is advised, because it usually reduces the number of objects that need to be rendered and therefore reduces the number of bits required for storing the object index. It is possible that some scenes could not be processed correctly, if the number of objects and surfaces in the geometry is too big. By culling the scene before the rendering, we can potentially reduce the number of occurrences of this issue.

In our frustum culling method we use an axis aligned bounding box (AABB) coupled with a bounding sphere for each object to perform fast intersection testing. In addition, every object also has its geometry further divided with an octree, which can be used to for more precise culling [5].

We performed a test of the efficiency of the algorithm with and without culling. The comparison is shown in Figure 1. In the scene that was used for testing, culling had a mostly negative impact on performance because geometry of the scene was tightly localized. The only noticeable speedup occurred when just a small part of geometry was visible (viewpoint 7) and when no geometry was visible (viewpoint 8). Culling would have a more positive effect on the performance if the objects were more uniformly distributed.

Figure 1 - Comparison of the performance of the algorithm with and without usage of culling.

4. DRAWBACKS

The proposed method also has some drawbacks, which need to be considered when the algorithm is used. In this chapter we give a detailed description of these drawbacks.

4.1 Performance scaling

Retrieving (downloading) data from the GPU memory into system memory is a slow operation and can therefore become a bottleneck of the method [2]. The higher rendering resolution we use the longer we need to wait for the transfer of data to finish, before we can start processing the data. A resolution increase of

X

times will result in an

X

2 increase of the number of pixels in the rendered image. Figure 2 shows how the resolution of the image affects the actual performance of the algorithm.

Figure 2 - Performance scaling with resolution increase In the figure the resolution increases are

2

times the previous resolution so that the pixel count increases by a factor of 2 (

X

2;X

2

). The graph matches our prediction and we can see that the run time starts to sharply rise at a certain resolution, as we would expect from an exponential function. The exact point when this sharp jump happens depends on the hardware configuration of the host PC and is especially influenced by the speed of the graphics card and the bus bandwidth.

4.2 Size of the viewport

The algorithm we propose requires two rendering pass and two rendering surfaces (images). One render surface is used by the visibility algorithm and the second render surface is used to render the view of the scene from the selected viewpoint. In continuation we will call the first surface the visibility surface and the second one the view surface.

The sizes of the visibility surface and the view surface need not be equal, but this can affect the preciseness of the algorithm. Usage of small visibility surface (smaller than the view surface) should result in a lot of surfaces getting marked as false negatives, since some surfaces will not be rendered to the visibility surface. An example of this is shown in Figure 3. The red surfaces have all gone undetected although they are visible, whilst the light green surfaces have been marked. Figure 4 shows

the results of the visibility detection algorithm when the visibility surface is the same size as our view surface. Figure 5 shows the results of the algorithm when the visibility surface is bigger than the view surface. There are some undetected surfaces at the selected viewing resolution, but there are many more marked surfaces that we do not actually see at the selected viewing resolution. Figures 3, 4 and 5 show the same scene rendered from the same viewpoint with the same size of the view surface, only the size of the visibility surface was different.

Generally, there is no reason to use a visibility surface that is smaller than the view surface if the goal is to discover surface visibility. We should use the visibility surface that is at least the same size as our view surface. Usage of larger visibility surface is easily possible and also suggested since by using it we capture all the small surfaces in the distance that we would otherwise miss.

Figure 3 - Usage of the visibility detection algorithm using view window coupled with a smaller visibility results in a lot of false negatives (red colored surfaces)

Figure 4 - Result of the visibility detection algorithm using view window coupled with a visibility surface that is the same size. There are no false negatives or false positives.

Figure 5 - Result of the visibility detection algorithm using view window coupled with a larger visibility surface. There are a few false negatives and a few false positives, which we do not see here, since the size of the view window is constant.

4.3 Triangle size

The method is also sensitive to the size of surfaces, since just one visible pixel of a large surface is enough to mark the whole surface as visible. This is especially emphasized on surfaces at the edge of the view area, where a large number of surfaces get marked as visible even though a only a tiny part of the surface is visible. If this is undesired there are methods that can be used to reduce this noise:

1. We can attempt suppressing the noise by counting how many pixels belong to each visible surface and then mark the surface as invisible if the number of pixels is under some preset threshold. The threshold would have to be determined depending on the type of geometry and the distance from the geometry. This method needs more post processing of the data, but does not require any changes of geometry of the scene.

2. We can preprocess each surface and break up large surfaces into smaller ones. This way we suppress the amount of noise generated on the expense of more complex scene geometry, but the post processing time is the same as for basic algorithm.

Which method is more appropriate depends on the host hardware and the scene geometry. However, even if we use these methods some noise in the output is unavoidable as it is very unlikely that every triangle detected by the algorithm will be fully contained inside the view frustum. Therefore we can only try to prevent the extreme case where large surfaces get marked as visible although only one pixel of the surface is visible.

5. OTHER METHODS

Today, many algorithms for detection of visible (or hidden) surfaces exist. One of the more popular methods is based on ray casting [6]. The method works by casting rays from the viewpoint into the scene geometry and finds the intersected surface that is closest to the rays point of origin and if such a surface exists, we mark it as visible. It is also possible that a ray

does not intersect any geometry. As ray casting usually works in image space so one or more rays is cast into the scene for each pixel of the output image. Because of this the method is subjected to the same problems of resolution scaling as our method, since a resolution increase of

X

times will require at least

X

2 times more rays to be cast. Each ray also has to do a lot of costly intersection testing operations before it finds the closest surface.

The method has several benefits compared to our method: - does not require a GPU,

- very precise, since it calculates the points of ray- surface intersection,

- not limited by how many objects or triangles it can check,

but it also has several drawbacks: - computationally heavy,

- complex implementation is needed for a fast solution, - scales poorly with bigger scenes and resolutions.

5.1 CONCLUSION

We have designed and implemented a fairly robust and fast solution for finding all visible surfaces from a given point of view. Early results are promising and we can achieve real-time visualization of quite complex scenes, although the speed of the execution is very dependent on the host hardware and target resolution of the view window. Our future work will be oriented toward speeding up the method, so that we could perform real time rendering even for higher resolutions.

6. REFERENCES

[1] Buffers on GPU, Website

http://jerome.jouvie.free.fr/OpenGl/Lessons/Lesson5.p hp (10.2.2010)

[2] Data readback from GPU, Whitepaper

http://developer.nvidia.com/object/fast_texture_transfe rs.html (15.2.2010)

[3] Frustum culling, Website

http://www.flipcode.com/archives/Frustum_Culling.sht ml (3.6.2010)

[4] Occlusion culling, Website

http://http.developer.nvidia.com/GPUGems/gpugems_c h29.html (5.6.2010)

[5] Octree data structure, Website

http://en.wikipedia.org/wiki/Octree (13.5.2010)

[6] M. Berg. Ray shooting, depth orders and hidden surface

Efficient approach for visualization of large cosmological

In document Plataforma modular e interfaces genéricas de transductores para redes de sensores inalámbricas (página 65-86)