CAPÍTULO 6. ANÁLISIS DEL DISCURSO PEDAGÓGICO: CÓMO SE CONSTRUYE EL CONOCIMIENTO EN
6.1 Cómo se construye el conocimiento en el aula de ILE
6.1.5 Elaboración: How is it like in Chile?
Directive-based solutions empower the application developer to explicitly paral- lelise their application through the introduction of directives, or pragmas, into their existing source code. Primarily, this is in the form of identifying suitably computationally intensive loops and parallelising each in turn by splitting their computation among the threads available on the targeted hardware.
When heterogeneous hardware platforms are being targeted, directives are used to transfer the relevant codes loops/sections, and their associated data, between the different processors from which the system is comprised.
For over two decades, the industry standard directive-based OpenMP [24], has dominated. However, with the emergence of heterogeneous hardware of a number of alternative solutions, from OpenHMPP (Hybrid Multicore Par-
allel Programming) [70], OpenACCRand IntelRLanguage Extensions for Of-
fload [152] are coming to the fore. Although convergence between these models is beginning to occur. OpenMP 4.0 introduces new directives to target hetero- geneous platforms, and OpenACC, in the latest PGI invocation, providing the ability to execute on the cores of a CPU.
Additionally, there are a number of niche solutions which extend the ideas developed in the previous models. StarSs [62] and OmpSs [86], for example, hide the synchronisation operations which are usually required to be carried out explicitly by the developer, by inferring when these operations should take place from the data dependencies inherent in the application code.
3.2.1
OpenMP
From the mid-1990’s, OpenMP [24] has become the industry accepted paradigm for a directive-based parallel programming model. By inserting these directives into existing Fortran or C/C++ applications, primarily targeting computational intensive loops, the code developer can express the sharing of resources within the system and is thus able to express a shared memory parallel application.
By its nature, its main target has been shared memory, CPU-based systems, taking advantage of the shared memory available within CPU nodes. With the 4.0 release [21] support has been added to specifically target attached accel- erated devices. Although supported in nearly all commercial and open-source compilers, the features which specifically relate to accelerated devices have only been implemented by a smaller subset of compiler vendor’s offerings at the time of writing.
OpenMP allows an application to be implemented in a gradual manner and
does not require the whole application to be parallelised. However, care is
needed to ensure any OpenMP regions are thread safe (i.e. data independent), that is that modifications to global variables are appropriately synchronised and controlled. Limitations exist relating to the nesting of OpenMP directives, and issues in using OpenMP in some object oriented (OO) C++ applications. Lack of interoperability in a thread safe manner with the C++ standard template library (STL), has seen the emergence of some dedicated abstraction libraries (see Section 3.3) to support such applications via OpenMP.
3.2.2
OpenACC
RThe OpenACC Application Program Interface (API) [23] is a high level pro-
gramming model based on the use of directives. By applying directives to
original Fortran, C or C++ source code it aims to provide increased architecture
portability with minimal code modification. This increase in portability is
offered without compromising code maintainability, a key consideration for existing complex industrial applications. At the time of writing three compiler
vendors: CAPS1 [26], Cray [31], and PGI2 [131] support the initial OpenACC
release.
Prior to the support of a common OpenACC Standard, Cray, PGI and CAPS each had their own bespoke set of accelerator directives from which their implementations of OpenACC is derived. A brief overview of each vendor’s implementation, along with limitations, follows.
1As of June 27, 2014 CAPS Enterprise ceased trading, and the CAPS Compiler is no longer
available.
Cray originally proposed accelerator extensions to the OpenMP standard [21] to target GPGPUs, through their Cray Compiling Environment (CCE) compiler suite. These evolved into the “parallel” construct in the OpenACC standard. Rather than creating a CUDA source for the kernels, CCE translates them directly to NVIDIA’s low-level Parallel Thread Execution (PTX) [155], a pseudo-assembly language subsequently compiled by the graphics driver into binary code. CCE is currently only available on Cray architectures, restricting portability.
As of version 10.4 of their compiler, PGI supported the PGI Accelerator model [19] for NVIDIA GPUs. This provided their own bespoke directives for the acceleration of regions of source code. In particular their “region” construct evolved into their implementation of the OpenACC “kernel” construct.
Initially, CAPS (Compiler and Architecture for Embedded and Superscalar Processors) provided support for the OpenHMPP directive model [70], which
served as the basis for their implementation of the OpenACC standard. A
major difference with CAPS is the necessity to use a host compiler. Here code is directly translated into the application developer’s choice of either CUDA or OpenCL [125]. In the case of the latter, this increases the range of architectures which can be targeted.
3.2.3
Intel
RLanguage Extensions for Offload (LEO)
Intels Language Extensions for Offload (LEO) [152] consist of directive-based pragmas for use in C/C++ and Fortran based applications. These constructs are Intel specific and were introduced into the Intel compilers in order to target the Intel Xeon Phi as a way to run source code on a host Xeon CPU and
“offload” marked sections, through the used of the offload pragma directive,
onto the Xeon Phi co-processor.