4.1. Identificación de la idea
4.3.3. Localización
In this section we show the progression of the automated modelling technique. We start by generating a model for a simplistic 2D magnetohydrodynamics (MHD) application, Lare2D, with good memory scaling. This enables us to demonstrate the methodology working in a controlled environment.
Due to the simplicity of Lare2D we attempt to model the strong scaling of a large problem size 40962. We do this by analysing the HWM traces from execution on two and four core runs; the HWM values for these runs are shown
Prediction (MB) Actual (MB) Error (%) 1 4505.22 4495.02 0.22 2* 2259.88 2259.91 -0.00 4* 1137.22 1138.13 -0.08 8 575.89 577.37 -0.26 16 295.23 296.47 -0.42 32 154.91 149.36 3.70 64 84.78 85.67 -1.04 128 49.75 58.98 -15.64 256 32.34 42.32 -23.59
Table 7.2: Model prediction results forLare2D40962
in Table 7.1.
We can see that at this scale the memory scaling is highly efficient (1.99× reduction in HWM for a 2× increase in problem size). Whilst this does not express any of the more complex behaviour the code may exhibit at scale, it is suggestive that our model will be largely based around a local problem size value.
F(P, N) = 280.7P
N + 1016N+ 15257783 (7.2)
The model generated from these two trace files is shown in Equation 7.2. As we can see Lare2D has very good memory scalability; only a very small component increases with core count, and the constant consumption is relatively low (14.5 MB). As predicted, there is a large component dedicated to local problem size, indicating that the code should scale well.
Using the model we can then predict the memory consumption of Lare2D at scale. Table 7.2 shows these predictions validated against experimental results. We see that the model error is very small, with a slight under-prediction, until 128 cores, where the error jumps to over 15% indicating a change in behaviour. We can see from Figure 7.1 that our model prediction for Lare2D 40962 at 16 cores is accurate for the HWM. We can also see that there are three distinct phases to the execution: an initial startup phase, a compute phase and finally a concluding phase. The start and end phases represent the problem composition
0 50 100 150 200 250 300 350 400 0 20 40 60 80 100 Memory Consumption (MB) Time (%) Memory Consumption Model Prediction
Figure 7.1: Model prediction against temporal trace of Lare2D on 16 cores
0 10 20 30 40 50 60 70 80 0 20 40 60 80 100 Memory Consumption (MB) Time (%) Memory Consumption Model Prediction
Figure 7.2: Model prediction against temporal trace of Lare2D on 128 cores
and I/O operations of the application, rather than the actual compute phase. If we plot one of the higher core count runs, where our model predictions were less accurate, such as 128 cores (Figure 7.2) we get an insight into the cause of model inaccuracy. The previously memory dominant phase, the compute, is now shadowed by the surrounding I/O phases. Whilst still inaccurate at this scale, the model is actually predicting the memory consumption of the compute phase, rather than this newly dominant I/O phase. Our model is, in fact, over predicting the compute phase by≈10 MB, suggesting that we are not factoring in sufficient scalability.
Prediction (MB) Actual (MB) Error (%)
64* 85.55 85.67 -0.14
128* 54.71 58.98 -7.23
256 39.35 42.32 -7.02
Table 7.3: Second model prediction results forLare2D40962
7.3.1
Multiple Models
By modelling the I/O phase and the compute phase separately, by invoking the analysis tool on different runs, we can generate a second model for Lare2D. We can then simply take the maximum of these models, for any given scaling point, as our prediction.
F(P, N) = 246.9P
N + 566.5N+ 24939843 (7.3)
Equation 7.3 represents our new model for the I/O phase, based on the 64 and 128 cores runs. Their validation is presented in Table 7.3, and presents a significant improvement over those in Table 7.2.
We note that we first observed this I/O phase at very low core count, but did not anticipate it to exhibit a different scaling behaviour to the compute phase. Another approach would have been to generate three models, from our initial traces, one for each of the obvious phases. This would allow us to plot the behaviour of the of the phases over time, however our approach of sampling the traces when the new phase becomes dominant allows us to be more confident about the magnitude of this consumption.
A future extension of these capabilities would be to allow the user to specify regions of interest and generate models solely for those regions.
7.3.2
Increased Problem Size
Using our new compound model we can now attempt to predict the memory consumption of a larger problem (81922). Table 7.4 validates our predictions for this new problem size, and we observe a generally high level of accuracy. As
Prediction (MB) Actual (MB) Error (%) 1 17977.43 17942.23 0.20 2 8995.99 8978.81 0.19 4 4505.27 4495.13 0.23 8 2259.91 2253.38 0.29 16 1137.24 1131.62 0.50 32 575.92 570.83 0.90 64 295.28 289.97 1.83 128 155.01 149.64 3.59 256 85.65 86.18 -0.61
Table 7.4: Compound model prediction results forLare2D81922
we have increased the global problem size, it is easier for the models to track the memory consumption, as a higher percentage of memory is consumed by the local problem.
Our simplistic model of Lare2D does not capture the behaviour of ghost cells, but rather encompasses them within the local domain behaviour. This failure to capture the nuances of behaviour are more problematic in the modelling of smaller problems, as the ghost cells make up a larger percentage of memory consumption. As we move to the larger problem, there is a reduction in the ratio of ghost cells to data cells, thus increasing the accuracy of our model. With a 40962problem on 256 cores, we have a local problem size of 256×256 cells with a local boundary of 1024 cells; for the 81922 problem, we have a local problem size of 256×512 cells with 1536 boundary cells. Thus the ratio of volume to boundary cells has increased from 192:3 to 256:3, giving more efficient memory utilisation and making it easier for our basic model to predict memory consumption, without the consideration of ghost cells.