• No se han encontrado resultados

U NIDAD 4: D ANS UN MAGASIN DE VETEMENTS . 10

When paralleizing a sequential program, the speedup in computation can be cal-

culated using Amdahl’s Law [139], defined in Equation5.5.

Speedup = 1

(1 − P ) + P N

(5.5)

Where P , represents the portion of the sequential programme in percentage that can be parallelized and N represents the number of computers used in the com- putation. Theoretically, in the case when a sequential program can be fully par- allelized (P = 1), as was the case with PDFA, the speedup of the parallelized program should be equal to the number of computers used in the computation N . Therefore:

Speedup = 1

(1 − P ) + P

N

≤ N (5.6)

However, as shown in Figure 5.15, the closest speedup to Equation 5.6 that the

sequential DFA when 4 VMs were used in the process. The speedup of the PDFA never achieved N times in a Hadoop cluster with N computers even though the sequential DFA was fully parallelized. This means that Amdahls Law in the form

of (5.6) is not sufficient in calculating the speedup of a parallelized program that

is executed in a cluster computing environment. This is because Amdahl’s Law in this form does not consider the communication overhead of a user job in cluster computing. For this purpose, a revision to Amdahls Law is proposed in the form

of Equation5.7, to better reflect the speedup gain when parallelizing a sequential

program in cluster computing.

Speedup = 1

(1 − P ) + P

N + R

< N (5.7)

Where R, represents the ratio of the communication overhead to the computation of a user job, and R > 0.

The revised Amdahl’s Law (5.7) better explains the speedup of a parallel program

running in cluster computing. The larger a dataset is, the higher overhead in computation will be incurred. As a result, the lower the ratio of communication to computation would be achieved, which leads to a higher speedup in computation. This well explains the speedup of the PDFA in computation when processing the 3 datasets with varied sizes.

To achieve an optimal performance in speedup, the ratio of communication to the communication of a parallel program should be minimized. In the case of Hadoop MapReduce clusters, the size of the segmented data blocks shall be large. On one hand, a large size of data block will generate a small number of tasks that incurs a small overhead in communication. On the other hand, a large size of data block will lead to a high workload in computation. Therefore, a large size of data block will lead to a low communication to computation ratio generating a high speedup. To evaluate how the size of a data block affects the computational performance of PDFA, the algorithm was run on a dataset of 352MB using 8 VMs with varied sizes

2 4 8 16 32 100 120 140 160 180 200 220 240 260 Block Size (MB)

Execution time (minutes)

Figure 5.16: Computational overhead of PDFA against data block size.

2 4 8 16 32 1 1.2 1.4 1.6 1.8 2 2.2 Block Size (MB) Speedup

Figure 5.17: The speedup of PDFA against data block size.

that the execution time of PDFA decreases with an increasing size of data block. The speedup of PDFA in computation goes up with an increasing size of data

block, as shown in Figure 5.17. It can be seen that PDFA is 2.04 times faster

in computation using 32MB data blocks than when using 2MB data blocks, thus confirming a greater improvement in performance with larger datasets.

5.6

Concluding Remarks

In this chapter a novel event detection methodology based on Detrended Fluctu- ation Analysis (DFA) was presented. The method was demonstrated as a basic means of event source location through identifying the closest PMU to an event,

and also the determination of the exact event start time (t = t0), deemed vital to

inertia estimation methodologies.

The suitability of a transmission system event to provide an estimate on the total inertia of the power system was defined, the key requirement on this being the in- stantaneous nature of the loss. The identification of such events was demonstrated, again using the DFA algorithm.

The approach was then further expanded to be paralellised for the use on massive

volumes of PMU data. With this an introduction to Big Data analytics was

provided and the implementation of the Parallel DFA (PDFA) approach presented in the MapReduce programming model.

The experimental results have shown the speedup of PDFA in computation whilst maintaining relative accuracy in comparison with the sequential DFA. Based on the analysis in the speedup of computation, an improvement to Amdahls law was proposed, introducing the ratio of communication to computation to enhance its capability to analyse the performance gain in computation when parallelizing data intensive applications in a cluster computing environment.

Further work is proposed to investigate the methodologies to automatically op- timize the configuration settings of Hadoop MapReduce parameters. This will further improve the performance of the PDFA algorithm.

Inertia Estimation of the GB

Power System

6.1

Introduction

The GB system is required to accommodate an increasing volume of renewable energy, predominantly in the form of offshore wind, asynchronously connecting to the periphery of the transmission system. This displacement of traditional thermal generation is leading to a significant reduction in system inertia, thus making the task of system operation more challenging.

The inevitable shift towards a more dynamic system compounds the existing issues of calculating generator response and reserve requirements, which traditionally as- sume that system inertia varies linearly with demand. With demand being met by a growing percentage of asynchronous generation, such as renewables and HVDC interconnectors, this assumption is becoming increasingly invalid. Frequency ser- vices are becoming more complicated and less predictable throughout the day, forcing reassessment of generation patterns and limitations on single circuit risks, making it more difficult to maintain security for all credible contingencies.

It is therefore necessary to gain an improved understanding of both the inertial frequency response of the power system and the security of the system in real- time. This will ensure the impact of incidents to specific areas of the network is understood, facilitating more economically efficient operation of the power system. In this chapter a method is proposed for estimating the total inertia of the GB power system, by dividing the network into groups or regions of generation based

around the constraint boundaries of the GB network [11]. The inertia is first esti-

mated at a regional level before it is combined to provide a total estimate for the whole network. This estimate is then compared with the known contribution to inertia from generation, to provide an estimate for the currently unknown contri- bution to inertia from residual sources; namely synchronously connected demand and embedded generation. The approach is first demonstrated on the full dy- namic model of the GB power system before results are presented from analysing the impact of a number of instantaneous transmission in-feed loss events, using phase-angle data provided by the 3 PMUs from the GB transmission network and also the devices installed at the domestic supply at 4 GB Universities.