4. CASO DE ESTUDIO
4.3 Evaluación Multicriterio del caso de estudio
4.3.2 Matriz multicriterio y análisis de agrupamiento
3.3
Blocking Point-to-Point Communication
b.y = SendRecv(a.x) b.y = a.x b.y = SendRecv(a.x)
Processa Processb
Figure 3.3: Schematics of a blocking point-to-point MPI communication using theSendRecv construct of the SAC notation. A variablex on process a is sent to process b and saved in the variable y amounting to the PGAS assignmentb.y = a.x.
The foundations of adjoining single send and receives have been presented in [62]. Here it is used to illustrate the method of combining PGAS, SAC and MPI semantics introduced in Chapter 2. The proof of the adjoint pattern is more detailed and will be more condensed for the patterns in the following sections. Blocking send and receives are the most basic form of communication in MPI. They represent a data array being sent from one node to another. Each send of one process must be matched by a receive on another process and vice versa. It describes core properties of a message communication. Namely, it has a source and a destination (or sink). Both of them are specified at runtime through the send and receive MPI calls defined as follows:
int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int src, int tag, MPI_Comm comm, MPI_Status *status)
with
• buf: address of send or receive buffer • count: number of elements in the buffer • datatype: data type of each buffer element • dest/src: rank of destination or source • tag: message tag
• comm: communicator
• status: (only receive) status object
The arguments are divided into the three aforementioned categories (buffer definition, remote rank and MPI specifics). buf, count and datatype define the data in the buffer that is being sent or received. dest/src defines the source and the destination of the message. tag, comm and status are MPI specific arguments that are of no relevance for the adjoining of the message. By calling a send or receive, the execution dives into the MPI library where the buffer buf is sent or received to or from a remote process. Only when this communication is finished, the execution returns from the MPI library.
Theorem 3.1. The adjoint communication of a blocking send (MPI_Send) with buffer x is a receive (MPI_Recv) with buffert(1)followed by an increment of the adjointsx(1)+=t. The adjoint communica- tion of a blocking receive (MPI_Recv) with buffery is a send (MPI_Send) with buffer y(1)followed by a nullification of the adjointsy(1) = 0.
Proof. Without the loss of generality, we assume that the communication buffer is elemental with one single value. As has been laid out in Section 3.2, MPI functions are treated as intrinsic functions. From a global perspective, a send/receive is one intrinsic function that is executed on two processes. The output of the function is on the receiving side, whereas the input of the function is on the sending side. Moreover the PGAS notation is used by prepending every variable with the process it belongs to. The intrinsic function corresponding to a send/receive pair may then be written as:
b.y = SendRecv(a.x)
b.y is the output whereas a.x is the input. Note that the SendRecv is executed on both processes (see Figure 3.3). According to the semantics of the blocking communication in MPI, the communication amounts to the following operation:
b.y← a.x
By interpreting the semantics, theSendRecv function is decomposed, yielding the following PGAS code: b.y = a.x
According to the adjoint model, we end up with the corresponding adjoint statement: a.x(1) += b.y(1)
b.y(1) = 0
The nullification of adjoint variables (e.g. b.y(1) = 0) was introduced in Section 2.4.1. Nullifying is only required if the variableb was previously used in the original code. The SAC assumes that each intermediate variable is only used once. This is of course not the case in practice where programming languages do not put such restrictions on variables. However, it is unclear whether such a restriction applies to the MPI buffers after they were adjoined using an AD tool. All the implementations in this work (see Chapter 5) use single use buffers as it assumed that the impact on the performance is negligible. Hence, nullifications are not considered in the pattern benchmarks. However, for completeness, they are included in the reverse communication patterns if an AD tool relies on correct adjoint nullifications.
The adjoint interprocess communication is modelled by an incremental assignment in the adjoint section. In the end, every interprocess communication has to be implemented in MPI. However, there exists no incremental assignment by point-to-point communication in MPI. Therefore the interprocess incremental assignment has to be split into an assignment and a local increment.
a.t(1) = b.y(1) a.x(1) += a.t(1)
b.y(1) = 0
A send and receiveSendRecv construct is used for communication and this leads to the following single assignment code:
Process a
. . . .
si−1 noop a.x(1)+= a.t(1)
si: b.y = SendRecv(a.x) a.t(1) =SendRecv(b.y(1))
3.3. BLOCKING POINT-TO-POINT COMMUNICATION 39
Process b
. . . .
si−1: noop b.y(1) = 0
si: b.y = SendRecv(a.x) a.t(1)=SendRecv(b.y(1))
. . . .
The adjoint communication described by this extended SAC for the sendera and the receiver b fulfills the claim of the theorem. The arrows hint at the execution order of the code. First the forward section is executed from top to bottom, followed by a bottom up execution of the reverse section. The arrows will be omitted in future extended SAC listings.
Adjoining the blocking point-to-point communication mainly involves circumventing the incremental communication not available in MPI. Besides this, it essentially consists of reversing the underlying data flow. An incremental point-to-point communication in MPI would facilitate the adjoint implementation and potentially improve efficiency.
3.3.1
Pattern Runtime
The implementation of the pattern is a straightforward implementation of a send/receive pair with the forward pattern and reverse pattern fulfilling the premisses of the forward section and reverse section as deduced in this section. As has been motivated in the proof, the nullification of the adjointb.y(1) = 0 is never implemented in the pattern benchmarks since it is considered to be tool specific.
1void passive_pattern(double *x, int &n) {
2 if(rank==0) 3 MPI_Send(x,n,MPI_DOUBLE,1,0,MPI_COMM_WORLD); 4 if(rank==1) 5 MPI_Recv(x,n,MPI_DOUBLE,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE); 6} 7
8void adjoint_forward_pattern(double *x, int &n) {
9 if(rank==0) 10 MPI_Send(x,n,MPI_DOUBLE,1,0,MPI_COMM_WORLD); 11 if(rank==1) 12 MPI_Recv(x,n,MPI_DOUBLE,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE); 13} 14
15void adjoint_reverse_pattern(double *x, double *z, int &n) {
16 if(rank==1) {
17 MPI_Send(z,n,MPI_DOUBLE,0,0,MPI_COMM_WORLD);
18 }
19 if(rank==0) {
20 MPI_Recv(z,n,MPI_DOUBLE,1,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
21 for(int i=0;i<n;i++) x[i]+=z[i]; 22 }
23}
Compared to the passive pattern, the combined forward and reverse pattern amount to a doubling of the communication and an additional operation for the increment of the adjoint. Both a send and a receive have a complexity ofO(n) with n being the message length. The increment is assumed to have
a complexity ofO(n) too. Thus, the passive pattern has a runtime complexity of O(n) whereas the combined adjoint pattern has a runtime complexity ofO(3n). Hence, a slowdown factor of δc = 3 is to be expected. Runtime tests yield a slowdown factorδcranging from 2.9 to 3.4 depending on the length of the message.