Capítulo 3. UEPS secuencia para enseñar la representación gráfica de la función
3.2.3 Etapa tres Situaciones problema iniciales
Most of the work in this dissertation has focused on addressing wavefront irregularity that arises as the result of input-dependent dataflow in an application. Another source of wavefront irregularity alluded to in our discussion of TLAV processing above is that which arises from multiple granularities of operation being required in the same application. In the case of TLAV applications, operations over vertices and operations over incident edges of each vertex represent two separate granularities at which SIMD execution is performed. In the case of our Woodstocc application, operations over candidate DNA read matches and operations over virtual suffix trie nodes represented separate execution granularities, with reads being (irregularly) grouped by suffix trie node.
Figure 5.8: Example dataflow between three application module instances, one of which may filter data items. The top and bottom nodes operate on pods of data items, represented as rectangles labeled with their cardinalities. In practice these could represent, e.g., vertices with their number of incident edges. The middle node operates on individual data items. The mismatch in computation granularity between the modules requires one or more of them to internally implement data transformation operations such as pod explosion/repacking, and makes the data interface boundaries between them a priori ambiguous, as indicated by the ‘?’s.
Items and pods In general, individual fine-grained input items are grouped into pods, with some application modules operating on individual items and some operating on pods. Assuming module queues hold work units at their native granularities, supporting pods introduces a datatype mis- match between module queues, as some queues will store individual items and some will store pods. This mismatch must be rectified at some point during execution to expose work to each module at its native granularity. Executing an application with multiple granularities of computation efficiently, then, requires dynamic thread remapping to maintain utilization when operating on both pods and items, and careful data management to expose work appropriately but keep all items grouped into their respective pods throughout execution. Figure 5.8 illustrates these challenges.
Explosion and collection modules We propose implementing generalized versions of the ex- plode and collect operations used in our Woodstocc application to address the irregularity caused by multiple granularities of computation. These operations will be implemented as two special infrastructure modules in the MERCATOR framework and will operate as follows:
1. Explode module: given a queue of pods and a target number of data items to extract, this module will operate in two steps. (1) pull work: Mark the maximum number of consecutive pods whose cumulative number of items is less than the target number. (2) explode indices: Given the marked set of pods with counts of how many (densely-stored valid) data items are in each, calculate the target pod and item indices for each worker thread.
2. Collect module: given a worklist of items associated with pods and an indicator for which pod each item is associated with, this module will operate in two steps: (1) collect indices: Given an additional validity indicator variable for each item, densely store the valid items back in their pods. (2) push work: Given a set of pods, add the pods with at least one valid item back to the queue.
Figure 5.9 shows the application modules from Figure 5.8 augmented with appropriate infrastruc- ture module instances to solve the mismatched granularity problem, and Figure 5.10 isolates those instances and describes their operations in more detail.
We developed efficient concurrency-safe implementations of the explosion and collection operations for Woodstocc using parallel-scan primitives, and we anticipate that these implementations will generalize well.
For scheduling and execution, explode and collect module instances will be inserted into the appli- cation’s (conceptual) dataflow graph between the nodes they connect and will be treated identically to the original application modules. To illustrate this concept, Figure 5.11 shows an example appli- cation DFG annotated with the native execution granularity of each of its modules, and Figure 5.12 shows the example dataflow graph and possible worklist states for this application augmented with explode and collect modules.
The infrastructure modules we have described are novel in the context of existing data structures used in streaming application frameworks; we refer the interested reader to Section 5.2.8 as an extended footnote for details.
Figure 5.9: Dataflow between the three applica- tion module instances from Figure 5.8 (white), but with special infrastructure module instances (grey) inserted between them to implement item explosion and repacking. These infrastructure modules allow the application modules to ab- stract from data transformations and operate at their native granularities.
Figure 5.10: Infrastructure module instances iso- lated from Figure 5.9, with operations shown as- suming 128 threads assigned to operate on data items at one per item. The ‘Pull Work’ phase selects the maximum number of pods that may be pulled from the head of the queue which col- lectively contain fewer than 128 items. In the ‘explode’ phase, each of the 128 threads self- discovers which pod contains its corresponding data item and the index of that item within the pod. This information will be exposed to the downstream application module, allowing its threads to directly operate on data items. In the ‘collect’ phase, non-filtered items are stored compactly within their pods. The ‘Push Work’ phase pushes pods containing unfiltered items back onto the queue.
Figure 5.11: Dataflow graph for an example streaming application composed of three modules A, B, and C, each multiply instantiated. Nodes have been annotated to show the native execution granularity of their modules (item or pod).