Similar to the Hopfield net, the possibility of data and task parallelisms for the SOM has been explored. CAT provided a line by line computational profile (Figure 8.11). In this case, the distribution of the computations is more even, with some operations such as matrix root mean squared {mrms) and lateral matrix update {mlat) relatively more demanding than the other calculations.
sec X 10"^
Execution Time
line no
40 60
Figure 8.11. Line by Line Computations in SOM
The same pattern recognition problem, used in the Hopfield net simulations is solved using the SOM. A 64 input neurons by 12 output neurons SOM topology is used to cluster 64 dimensional input vectors into 12 classes. The network is trained in an unsupervised fashion, on 12 perfect patterns. After training, noisy input patterns are presented to the network, which are recognised by identifying the nearest output vector.
Data Parallelism in the SOM
CAT projections for the SOM show that little speed-up can be achieved through data parallelism comparing with the Hopfield net with the same problem size and communications speed. This is due to the relatively even distribution of computations on the SOM, and the parallel communications costs which offset the gains of parallel executions. Figure 8.12 shows data parallel CAT projections on the SOM with 64 input 12 output neurons, on a number of parallel VMs with communications speeds between 10 to 40 Mbits/sec. Any significant gain in performance occurs for communication speeds higher than 20 Mbits/sec.
Execution Time se c 10 Mbit/sec 20 Mbit/sëc 3Ô M bit/sic 4(T Mbït/sec no of Vms 1 2 3 4
Figure 8.12. Projections on Data Parallel SOM
Task Parallelism on the SOM
CAT results in Appendix D.2 point to the existence of a backward data stream on the weight matrix (SW). This prevents a straightforward instruction pipelining type of parallelism. The reason for the backward data flow is that the weight matrix is read and written at every pattern presentation step. If the changes are accumulated and batches of weight updates are carried out after each epoch, the network can be pipelined through a number of systolic processors, reducing the execution time considerably. The SOM was modified for this purpose, so the weight update changes could be accumulated in a matrix of the same size as the weight matrix, and the batch weight updates can be carried out. This new definition of the SOM was also processed by the CAT (Appendix D.2), and a cutting point on the MATLIB listing for an instruction pipeline was found. This partitioning resulted in approximately 30% to 70% load balance, on two parallel processors.
Task Parallel SOM on the SUN LAN
The batch-update version of the SOM was partitioned into two self-contained but interlinked MATLIB programs, with their own data definition and data transfer instructions. Two clients and one scheduler programs were compiled and executed on the LAN. This time the scheduler does not take part in neural network related tasks. It only loads data from the file server, distributes the data and waits for data routing instructions. The scheduler, as a passive server is in an infinite loop, and it runs the
send data to other VMs, through the server. The server parses these data transmission requests and carries out the orders by rerouting the data.
Figure 8.13 shows the simulation results for a number of SOM configurations executing sequentially, and in parallel on 2 SUN4s. The results confirm that even on general-purpose computing platforms it is possible to improve execution speed by task parallel techniques. Execution Time sec One sun4 200 Two sun4s 150 100 Output nodes 0 100 200
Figure 8.13. Sequential versus Pipelined SOM
To achieve the task parallelism for the SOM a modification had to be made in the weight update procedure of the algorithm. It has been reported that this change can delay or prevent the learning on this model. For this reason, the RMS error change is monitored both in the parallel SOM with batch weight updates and the sequential SOM with single step weight updates on the same pattern recognition problem. Figure 8.14 shows that, although the single step SOM convergence needs less number of iterations for the error to drop to an acceptable level, in the batch update case, the error profile follows the single step SOM error very closely. For this dataset, both methods produce similar results, and the batch weight updates can be used.
RMS error x 10"^ Error Profile single update batch update 140 120 100 20 Iterations 200 100
Figure 8.14. RMS Error in Sequential and Parallel SOM
The results show that data parallelism on the SOM can be useful on high speed communications links. Task parallelism is also feasible if the algorithm is modified to carry out batches of weight updates instead of standard single step weight updates.
8.3.3. The Backpropagation M odel
The same steps are applied to the Backpropagation model. The MATLIB
representation of the Backpropagation-with-momentum model is written and processed by CAT to detect possible parallelism and identify computational bottlenecks (Appendix D.3). An even computational profile has emerged, as a result, indicating all operations are computationally demanding (Figure 8.15).
Execution Time sec X 10"^ 800 600 400 200 III line no 60 80
The same pattern recognition problem was used with a three layered Backpropagation network with 64 input, 12 hidden and 64 output neurons. The simulation involved training the network for an auto-associative recall with the 12 base patterns. Once trained, the network was required to generate the same patterns from noisy or incomplete inputs.
CAT projections for a data parallel Backpropagation execution shows that the network can be executed faster, on fast communications links between the parallel VMs (Figure 8.16). Actual parallel simulations on SUN LAN are not implemented as this medium would not meet the communications requirements outlined by the parallel projections. The variable/loop analysis for the Backpropagation-with-momentum model shows a tightly coupled network architecture. Global matrix variables such as the input weights, which are between the input neurons and the hidden layer, and the hidden weights, which are between the hidden layer and the output layer, are all on backward flow data streams. Again, similar to the SOM case, the single step weight update procedure does not allow a profitable task parallelism for the Backpropagation MATLIB listing. The only possibility is to modify the algorithm to allow batch updates, and carry out instructions pipelines on a coarse number o f processors. Task pipeline simulations was not carried out for the Backpropagation model, as it involves similar steps to the SOM simulation which was previously described.
Execution Time sec 10 Mbit/sec 2dM'bit/se‘c 30 Mbit/sic 40 Mbit/sec no of Vms 1 2 3 4