• No se han encontrado resultados

The subdivision of a problem amongst many processing elements introduces the notion of the task, which is the unit of work assigned to a single processor in the system. The task granularity of a problem determines the computational effort associated with a task. During the process of problem decomposition, where task granularity is determined, a number of considerations are made, such as the communication frequency between the different processing elements, bandwidth requirements, and computation and data dependencies amongst others. These considerations must be made in the light of the infrastructure that will run the parallel algorithms. In shared-memory systems, the need for communication of data between processing elements is reduced; nevertheless, control on resources accessible by all processing elements needs to be ensured, to avoid inconsistencies due to unpredictable access such as race conditions. On the other hand, in dis- tributed memory systems, there is a need for communication since the processing elements each have their own private memories which cannot be seen or directly accessed by other processing elements.

Crockett (1997) provides a survey of parallel rendering for rasterisation and ray tracing methods amongst others; Chalmers et al. (2002) focus specifically on parallel rendering for ray tracing methods. In general, there are a number of approaches used in parallelising the rendering process, although the most common in interactive scenarios are the functional and data parallelism approaches. In the functional approach, the rendering process is split into several stages, with each stage mapping to some function, or group of functions, that can be applied to an individual data item. Each stage of the pipeline is mapped to a processing element, establishing a data path between the individual processing elements in true producer-consumer style. This leads to the formation of a sequential pipeline, also known as the rendering pipeline, where a processing element forwards each completed work item to the processing element assigned to the next stage, while receiving a new datum to process from the neighbour mapped to the previous stage of the pipeline. The functional approach has two significant limitations: the overall speed of the pipeline is determined by its slowest stage and the available parallelism is limited to the number of stages in the pipeline (Crockett, 1997).

The data-parallel approach transposes the functional approach; instead of processing a single data item stream, data is split into multiple streams and op- erated upon simultaneously. The advantage of this approach lies in its scalability

4. Parallel and Distributed Rendering 65

first and foremost. The number of processing elements employed in any rendering task can vary depending on the problem size, distilled in factors such as scene complexity, image resolution or desired performance levels (Crockett, 1997). The data-parallel approach is further subdivided into object and image parallelism. Object parallelism denotes operations which are independently carried out on the geometric primitives that make up a scene. Image parallelism, on the other hand, refers to operations used to compute individual pixel values of a synthesised image.

4.2.1

Task and Data Management for Ray Tracing

Task granularity is defined in terms of the smallest unit of computation possible with respect to the problem domain. In the domain of ray tracing-based render- ing, Chalmers et al. (2002) define the ray-object intersection operation to be the smallest element of computation, or the atomic element giving the finest level of granularity. A task is defined as the tracing of one complete path, from the eye to a light source. Path-level tasks provide the fine granularity required by a scalable algorithm; nevertheless, such a level of granularity might result in an accumulation of sequential elements, such as excessive communication per task or frequent accesses to synchronised data structures amongst others, swamping computation time by the effort required to set up the task itself. Thus, it is common practice to adopt an agglomerative strategy to amortise the sequential elements with respect to computation time without seriously compromising the scalability of the algorithm. Chalmerset al. (2002) refer to such an agglomeration as the task packet, a collection of one or more path-level tasks to be computed.

Ray-casting for visibility determination operates upon geometric object prim- itives and thus requires scene information to be readily available at the processing element. World data models represent problem sizes that fit in the individual pri- vate memories of processing elements in a distributed system, where all required scene information is replicated (Chalmers et al., 2002). In scenarios employing this model, no data management is required. On the other hand, problem sizes that do not fit entirely in memory require special external memory (out-of-core) algorithms for efficiently managing data and mapping parts of it to what memory is available at a processing element. This may entail working with data stored on secondary storage or on remote repositories that have to be accessed via a net- work interconnection. Such implementations may benefit from a virtual shared

4. Parallel and Distributed Rendering 66

memory (VSM) system, which not only provides all processing elements with a single unified address space, but also allows each one to operate on data sets which are larger than their individual private memories, by presenting a virtual world model view of the problem (Li, 1988; Li & Hudak, 1989; Chalmers et al., 2002).

A VSM may be provided at different levels, from application all the way down to hardware. For instance, the memory management unit (MMU) in a non-uniform memory access (NUMA) architecture would transparently determine whether a read or write operation is directed at a local or remote memory address and redirect the request accordingly. At operating system level, VSM is usually implemented using mechanisms similar to paging, where the address space is divided into fixed-size chunks, and any access to a chunk that is not available at the local machine would trigger a page-fault, leading the VSM system to fetch the chunk before restarting the faulting instruction. At compiler level, data item sizes may be arbitrary, with the compiler providing data transport and consistency while trying to maximise locality. Finally, at the application level, VSM is provided via data management middleware, which is responsible for servicing any data requests on behalf of the application.

Fundamentally, a VSM provides a shared memory abstraction to systems that communicate via message passing, interpreting and executing special read and write operations such that the shared memory is made consistent across all par- ticipants in the system. The efficiency of a VSM is highly dependent on the level of coupling of the processing elements in the distributed system, the intercon- necting infrastructure, and the underlying memory consistency model. Strong models, which attempt to order individual operations implicitly are usually less efficient than weaker models, which provide explicit primitives to enforce syn- chronisation of local and remote memories, thus explicitly ordering groups of operations instead (Mosberger, 1993; Chai, 2002).

4.2.2

Master-Worker Paradigm

A work distribution model that is often used for parallel computing is themaster- worker (or replicated-worker) paradigm, which is suited to solving computational problems that can be decomposed into a number of smaller nearly identical in- dependent tasks. In the basic master-worker structure, a single master process divides the problem at hand into a number of smaller tasks and then makes them

4. Parallel and Distributed Rendering 67

available to worker processes. The workers, who spend their time waiting for tasks to compute, request tasks from the master, process them, and respond with the results. The master is then responsible for collecting the results and combining them into a meaningful solution (Andrews, 1991; Freeman et al., 1999).

Traditionally, the master-worker paradigm is task-driven, although data-driven approaches have also been researched (Labidiet al., 2012). An inherent advantage of the master-worker paradigm, provided that the chosen task granularity is not excessively coarse, is load balancing. Each worker process may compute a number of tasks, one after the other; as soon as a task is complete, another is requested from the master’s centralised pool. While workers are occupied with large tasks, others might be completing several smaller tasks, naturally distributing work across the workers, based on their availability and the size of the workload. This approach favours the unpredictable workload nature of high-fidelity rendering using ray tracing methods. Furthermore, the class of applications that are suited for master-worker scale naturally, such that additional workers can be effortlessly added to a computation, generally speeding it up. The ability to change the num- ber of workers during the course of a computation is a very important property of this paradigm, when taking time constraints into consideration. Computations which consistently fail to meet assigned deadlines can be augmented with more workers. Replication can be used to make the system fault-tolerant by assigning failed tasks to other workers.

A single master process may be unable to handle an increasing number of workers, introducing a bottleneck in the system and adversely affecting its abil- ity to scale. Banino (2006) showed that using multiple masters, arranged hier- archically, can achieve good performance on large-scale platforms where a large number of independent tasks need to be managed. The asymmetry between the master and workers in the paradigm creates a single point of failure at the master, which is undesirable in systems requiring high availability or reliability. Replica- tion and checkpointing techniques can be used to improve fault-tolerance of the system.

4.2.3

Peer-to-Peer Systems

Peer-to-Peer (P2P) architectures have been used in data sharing, collaboration and for information dissemination. The decentralised nature of these systems addresses scalability problems in distributed applications that exist when the

4. Parallel and Distributed Rendering 68

number of clients starts to grow. P2P approaches aimed at sharing resources and information require efficient search mechanisms to locate required information in a timely manner. In local-area solutions, unstructured systems use multicasting facilities provided by the underlying hardware to broadcast queries for specific data. In large scale networks, implementing reliable multicasting is notoriously difficult (Jelasity & van Steen, 2002). An approach adopted by unstructured P2P systems was that of query flooding, whereby all reachable nodes are con- tacted to determine the availability of a resource on the network. Structured P2P systems such as Chord (Stoicaet al., 2001) and Tapestry (Zhaoet al., 2001) avoid the traffic caused by query flooding via the adoption of key-based routing and searching. Specifically, a distributed hash table system is used to provide a lookup service similar to an associative array; the search space is partitioned and the search criteria are associated with hosts holding the required resources.

A series of randomised algorithms for replicated database maintenance based on epidemic principles was introduced by Demers et al. (1987). This addressed problems of high traffic and database inconsistency, and was later exploited by Demers et al. (1994) in Bayou, a system providing support for data sharing and collaboration among weakly connected users, which used peer-to-peer anti- entropy for the propagation of updates. Jelasity & van Steen (2002) conceived the newscast model of computation, providing effective and reliable probabilis- tic multicasting, large-scale distributed file-sharing, and resource discovery and allocation, with the distinguishing feature being the membership protocol em- ployed. A peer may contact any arbitrarily chosen member and simply copy that member’s list of neighbours in order to join a group. Leaving a group is achieved by that peer merely ceasing its communication as opposed to notifying other members in the group about its decision.

Documento similar