RECOMENDACIONES A OTROS ESTUDIOS Y LIMITACIONES DE SU ESTUDIO

Título del gráfico

8. RECOMENDACIONES A OTROS ESTUDIOS Y LIMITACIONES DE SU ESTUDIO

The workloads used in this evaluation are intended to be representative of the computation performed by vision-oriented interactive workloads such as those used in computational photography.

instructions per cycle

Kernel Description

sobel Edge detection filter; parallelized with OpenMP

feature Feature extraction (SURF) from [38]

kmeans Partition based clustering; parallelized with OpenMP

disparity Stereo image disparity detection; adapted from [168]

texture Image composition; adapted from [168]

segment Image feature classification; adapted from [168]

Table 4.3: Parallel kernels used in the evaluation

• Feature extraction. feature from MEVBench [38] is a feature extraction application that is representative of the processing performed in camera-based search [58]. The dis- tributed MEVBench implementation of SURF (Speeded-Up Robust Features) uses pthreads for parallelism and is executed almost unmodified in this evaluation (barring static linking and simulator callbacks for timing). The input region is divided vertically and horizontally into regions which are handed to worker threads. Each worker thread first localizes feature points, then builds descriptors, next redistributes the extracted information conditionally, and finally writes back the extracted data. After each of the above steps, threads synchronize at a barrier. Length of computation varies with image size.

• Disparity comparison. disparity from SD-VBS [168], computes a disparity map that can be used to derive the relative depth of objects in a scene given two stereo images of the scene. The input to the workload is a pair of images (left and right) taken from slightly different positions, and a “shift” window. The sequential implementation shifts the right image by each shift value, correlates it with the left image (pixel-by-pixel), finding the minimum disparity between the images. The parallel implementation computes the disparity map for each shift window across the 2D-images in parallel, synchronizing in the last step to combine the minimum value. Length of computation varies with image and window size.

• Edge detection.sobelis an edge detection filter which convolves a 3 × 3 (constant) matrix across an input image. The computation is hence both regular and independent of pixel values. The parallel implementation tiles the image and applies the convolution step independently to each tile using OpenMP for the outerfor loop. Computation length varies with image dimensions.

that the sum of distances between each point and the center of its cluster (mean) is minimum. This kernel is used in vision applications like image segmentation. The sequential algorithm first assigns the first k points to be cluster centers, then iteratively recomputes the nearest cluster for each point, recomputing the new center of a cluster if its membership has changed. Based on previous implementations used as architectural benchmarks [86, 116], the “for- each-object” loop is parallelized, with thread-local clusters per-thread which are aggregated (array reduction) by the main thread. Computation length varies with number of clusters, points and dimensions.

• Texture synthesis. texture takes as input a small image and creates a larger image that is similar in texture to the small image. Such algorithms are used to fill holes or blank spots in image regions that exhibit some uniformity. The sequential implementation from SD- VBS [168] fills the output image dimensions from left to right and top to bottom scanning each pixel for candidate matches from the input texture and neighboring regions (already filled) in the output image. The parallel implementation tiles the image into rectangular blocks with each block filled independently. However, a caveat is that the filling proceeds with a neighboring ‘’L-shaped” causality — blocks can only be processed if their preceding tiles have been filled.

• Segmentation.segment from SD-VBS [168] partitions an image into segments that share similar visual characteristics. The sequential implementation blurs the input image using an edge detection kernel (similar tosobel), next assigns edge weights between neighboring pixels based on the smoothed image to create a graph, then sorts the edges based on these weights and finally creates clusters by collapsing nodes that share an edge if the edge weight between nodes (pixels or collapsed clusters) is less than an acceptable threshold (similar to

kmeans). Each of the pieces is task-parallelized. Compute length varies with image size.

Although these workloads are not complete applications, they represent the computation stages performed by interactive applications. Edge detection, clustering, and segmentation are preliminary steps in image processing and analysis, pattern recognition and several computer vision techniques. Clustering (kmeans) is used not only for signal processing, but also for applications such as su-

0 1 2 4 8 16 normalized speedup 1.5mg PCM 150mg PCM Parallel

feature ParalleldisparityParallelsobelParalleltextureParallelsegmentParallel kmeans Figure 4.3: Parallel speedup on different workloads for 16 cores.

pervised and semi-supervised feature/dictionary learning. Segmentation and feature extraction aid interactive applications like face and character recognition. Chapter 6 additionally introduces a speech recognition workload (not included here due to vagaries of the simulator). With human- computer interaction becoming increasingly prevalent in mobile devices (especially phones and tablets) phones and tablets, these workloads together capture some of the computation expected of such applications, but also some tasks which are today offloaded to the “cloud” due to computational complexity (for example featureperforms some of the computation representative of Google Goggles). However, because their most common uses entail human activity, performing such computation locally (on the mobile device) within human-acceptable response times could potentially improve user experience, and further enable new applications composed of such computation kernels.

In document Factores asociados al rendimiento académico (página 30-41)