RESULTADOS, CONCLUSIONES Y PERSPECTIVAS DE TRABAJO

ESTUDIO DE UNA EXPERIENCIA DE APRENDIZAJE INTERACTIVO PARA LA ASIGNATURA DE ESTRUCTURA

Pilas 1 Pilas 2 Colas Secuencias Abb AVL RN CP y Mont.

5. RESULTADOS, CONCLUSIONES Y PERSPECTIVAS DE TRABAJO

Parallel programming is the term used to describe when a program is designed to run on more than one processor in a computer simultaneously. This is achieved by breaking the program down into smaller components that can run in independence of each other, communicating through fixed channels. Implementing a program using parallel programming is desirable, as it has the advantage of improving (reducing) the run-time. This means the program will simply take less time to run, or larger problems can be dealt with in the same time frame as before. For the purposes of this thesis, we are interested in parallel programming as it may allow the solving of larger problems in an exact manner, something that would not be feasible without it.

The way in which the program is broken down into the individual components is heavily dependent on the algorithm being implemented, the data which it is to process, and the architecture or environment in which it is to be deployed. However, it can be classified into two broad categories; task parallelism, in which separate parts of the algorithm run concurrently and data parallelism, where the same part of the algorithm is operating on multiple pieces of the input data simultaneously. Generally however, programs use elements

of both paradigms, and fall somewhere on a spectrum between the two. At this point we also define the term execution pipelines. Pipelines enable a level of parallelism, by allowing multiple pieces of data into the pipeline simultaneously. For example, a piece of data could have an operation performed on it by stage one of the pipeline, while a different piece of data has a different operation performed on it by stage two, in a single clock cycle, thus allowing operations to be carried out in parallel.

As NVIDIA CUDA, which this thesis uses, adopts primarily a data parallelism paradigm, this is discussed next.

Data Parallelism

Data parallelism is the process of breaking down large data into smaller blocks, and passing these blocks to independent processors to process simultaneously. . Therefore this is only applicable in situations when different sections of the input data have no relationship to each other. For example, consider our case of a dynamic programming scoring grid. Should a processor need data from a different cell of the scoring grid in order to process the current cell, but this second cell is not in the same block the processor has been allocated, an error can occur if this is not handled. To handle such a scenario, communication will be required between the two processors, which can often be a slow process. This means the viability of parallel data processing is heavily dependent on the structure of the algorithm, as well as the relationship between the different points of input data. The size of the blocks, and the granularity of how far the data is broken down can be altered by the programmer as they see fit, for the algorithm and the architecture the program is to be deployed on. Should sufficient processors be available, or a massively parallel architecture be used, it is possible to break the data down such that each processor is allocated a single piece of data. In our example of a scoring grid, each processor is allocated a single cell of the grid.

In nearly all parallel processing settings it is required that there should be a level of communication between the independent processors. Examples of this could be updating each processor with information about progress in other blocks or synchronising some shared value between processors. A common cycle of a parallel program is to do a fixed amount of work, pause for communication, then continue working. There are two different approaches to dealing with communication in a parallel setting; explicit communication through a message passing system or implicit communication through the blocks sharing a portion of shared memory which multiple processors have access to.

Implicit communication through shared memory is the communication method which NVIDIA CUDA adopts, and therefore the method which will be adopted during development of the model in this thesis. Communication via shared memory takes place through multiple threads monitoring the same area of memory and watching for changes. A simple example would be, if communication was to occur, the processors would monitor the state of some shared variable, and when it transitioned to a specified state, they would be aware that other processors had written data for communication to the shared area. At this point, other processors would be aware it is safe to read this data back.

In a parallel setting, the time taken during communication can be a bottleneck in the code, as at these points the processors all have to cease computation, and wait until the communication finishes. This is an issue that is magnified when processors may be at different stages of work, and as such one may be idling for a considerable amount of time whilst it waits for another to finish and be ready to communicate. Therefore, for a parallel program to be effective, the amount of communication should be kept to a minimum.

As dynamic programming focuses on the idea of a central scoring grid, naturally dynamic programming algorithms would tend towards data parallel approaches.

In document IV Jornada Campus Virtual UCM: experiencias en el Campus Virtual (Resultados) (página 167-170)