CAPÍTULO VI: DISCUSIÓN DE RESULTADOS
6.2. Contrastación de resultados con otros estudios similares
The most popular high-performance graphics architectures are currently all very sim- ilar and fall under the sort-last-sparse classification (Molnar et al., 1994). (It is into this category that the NVIDIA 8300GS used in validating the model falls (Lindholm et al., 2008)). In the mobile market, however, the leading architecture is known as a tiled renderer (Imagination Technologies Ltd., 2010), a type of sort-middle architecture (Molnar et al., 1994). Briefly, a tiled renderer will process the scene one screen space tile at a time; all geometry is sorted into the appropriate tiles, the tile is rendered to a local rather than global framebuffer memory, and only the finished tile is sent to global memory for scan out.
When designing for the mobile realm, battery life is of the utmost importance, and proponents of tiled renderers claim that the very coherent writes to local framebuffer
memory will offset any overhead in reading and transforming some fraction of the input geometry more than once. However, there has been no published verification of this claim. By adapting my energy model’s underlying architecture to that of a tiled renderer, I can gain insight into the veracity of this assertion. Since validating this new model would be impossible without fabricating an entirely new architecture, I perform several sanity checks before applying the model to my test scenes.
The possible search space for this question is enormous, so I simplify the problem somewhat. First, I will only explore three different parameters to check the validity of the new model: (i) input geometry count, (ii) screen size, and (iii) the depth complexity of the finished scene. (Depth complexity is a measure of how much work is performed shading pixels that do not appear in the final image. The three test applications have a depth complexity of between 4 and 5. Very complex scenes can have a depth complexity as high as 30 in some limited testing I performed.) The baseline scene will have 100,000 triangles, a screen size of 1280x1024 pixels, and a depth complexity of 3. Other scene assumptions are: 48B of data per input vertex, 16B per fragment, 1.8 vertices per primitive, a depth fail rate of 0.5, an alpha blending rate of 0.25, framebuffer and depth compression ratios of 1.5, and vertex and pixel shaders with equal complexity. Additionally, I make the following assumptions in my tiled renderer’s energy model:
1. The added cost of pre-sorting the geometry will be one read of the input geometry, a pass through the vertex shader, and a write of a batch ID for every 32 input primitives,
2. There is a local framebuffer storage of 2MB, 3. The tile size will be fixed at 128x128 pixels, and
4. The depth buffer isnot stored to global memory after a tile is finished processing.
The results of my three experiments are shown in Figure 3.4.
1. Input geometry count. Increasing the input geometry directly increases
the overheads seen by a tiled renderer. Thus, the tiled renderer becomes relatively less energy-efficient as the geometry count increases. (I assume that when increasing the triangle count of a scene, the extra geometry will be put towards refining meshes, decreasing the size of the average triangle, therefore keeping the generated fragments and depth complexity the same.)
0.9 0.95 1 1.05 1.1 1.15 2 20 40 60 80 100 120 140 160 180 200
(a) Triangle Count (thousands)
0.6 0.7 0.8 0.9 1 1.1 480x320 640x480 800x600 1024x768 1280x1024 1600x900 1650x1080 1920x1200 2560x1600
Untiled:Tiled Energy Ratio
(b) Screen Size 0.8 0.85 0.9 0.95 1 1.05 1.1 1 2 3 4 5 6 7 8 9 10 (c) Depth Complexity
Figure 3.4: Energy efficiency of tiled versus untiled renderers for different scene parameters. A ratio greater than one indicates that the tiled renderer is more efficient.
2. Resolution. The strength of the tiled renderer comes from its ability to write to local memory during framebuffer operations. As resolution increases, the locality of these writes and the disparate nature of the full-screen renderer’s cause the relative efficiency of the tiled renderer to increase.
3. Depth complexity. Closely related to resolution is depth complexity. I note
a similar trend: as more pixel processing and framebuffer writing is performed, the relative efficiency of the tiled renderer increases.
When applying the new tiled renderer model to my existing test applications, I see that they are all less efficient with a tiled renderer (13% on average). While this would seem to suggest that this architecture is less efficient, there are several things to consider before accepting this outcome at face value. Firstly, my na¨ıve approach to the tiled renderer’s architecture certainly lacks optimizations employed by implemented hardware. For example, it is doubtful that the price paid to presort the geometry is a doubling of the initial effort, and the sorted geometry may even fit in on-chip memory. Secondly, the test scenes I examined all had complex input geometry; they were not meant to be run on mobile hardware! Developers would likely optimize their geometry and applications for such an environment.
To test how these applications would consume energy on an existing mobile plat- form, I adapted them to have a workload more characteristic of mobile applications. So, I first scaled the applications to be the resolution of a current mobile device, the iPhone 4: 640x960 pixels. Next, I treated the amount of input geometry as on par with the peak triangle rate of this device at 30 frames per second. The results after these modifications are shown in Figure 3.5. The main difference is that reading geometry and vertex shading steal some energy away from pixel shading due to the relatively small screen size; this is an expected result.