Sistema de gestión de la base de datos - E STUDIO DE ALTERNATIVAS PARA LA ARQUITECTURA SOFTWARE

3.3. E STUDIO DE ALTERNATIVAS PARA LA ARQUITECTURA SOFTWARE DEL SISTEMA DE PUBLICACIÓN DE

3.3.3. Sistema de gestión de la base de datos

A simplistic and restrictive assumption of previous research is that all possible arcs must have identical reliability and unit cost. This is a limitation of the approaches to the problem.

In real world design problems there are generally multiple choices for arcs, each with an associated reliability and unit cost, and other design attributes. When considering the economics of network design, it is important to allow designs with arcs of differing unit costs. The research presented in this section is from Deeter and Smith (1997, 1997a) and makes the significant relaxation that there are multiple choices of arc type for each possible arc, and the final network may have a heterogeneous combination of differing arc reliabilities and costs. While greatly improving the relevance of the problem to real world economic design, this complicates the network reliability calculation and exponentially increases the search space.

2.6.1 Encoding, Genetic Algorithm and Parameters

The following example for a problem with |N| = 5 and k = 4 levels of connection shows how a candidate network design is encoded. The chromosome: {0100203102} encodes the network illustrated in Figure 2.3. There are (5×4)/2 = 10 possible arcs for this example but only five are present; the other five are at level of connection li,j = 0. This information is placed in a chromosome using the possible values of 0,1,...,k–1.

The objective function of sections 2.4 and 2.5 (equation 2.2) is modified to

0 ( )) 50

where Cp(x) is the penalized cost, C(x) is the unpenalized cost and C(x*) is the cost of the best feasible solution in the population. This is a dynamic penalty that depends upon the length of search, g.

Table 2.4 Complete results comparing performance and CPU time on 79 test problems.

Problem B+B Section 2.4 GA Section 2.5 GA

No N L p Ro Best

11 6 15 0.95 0.95 227 3.90 0.0357 57.98 0.0023 14.08 12 6 15 0.95 0.95 213 0.11 0.0235 47.83 0.0193 10.03

13 6 15 0.95 0.95 190 0.00 0.0280 42.32 0 10.09

14 6 15 0.95 0.95 200 0.44 0.0238 57.54 0.0173 13.04 15 6 15 0.95 0.95 179 0.66 0.0193 46.97 0.0256 11.36 16 7 21 0.90 0.90 189 11.26 0.0177 130.71 0.0175 21.77

17 7 21 0.90 0.90 184 0.17 0 76.74 0 18.80

18 7 21 0.90 0.90 243 0.50 0.0167 135.98 0.0202 26.93 19 7 21 0.90 0.90 129 1.21 0.0121 122.46 0.0195 28.91

20 7 21 0.90 0.90 124 0.05 0 83.45 0 23.77

21 7 21 0.90 0.95 205 0.83 0.0406 301.41 0.0337 71.40

22 7 21 0.90 0.95 209 0.06 0 71.4 0 37.06

23 7 21 0.90 0.95 268 0.06 0.0310 255.73 0.0187 56.39 24 7 21 0.90 0.95 143 0.17 0.0264 280.26 0.0193 78.72

25 7 21 0.90 0.95 153 0.01 0 160.43 0 52.93

26 7 21 0.95 0.95 185 22.85 0.0333 112.26 0.0111 28.89 27 7 21 0.95 0.95 182 1.27 0.0046 81.78 0.0035 16.99 28 7 21 0.95 0.95 230 1.76 0.0090 109.47 0.0072 26.64 29 7 21 0.95 0.95 122 2.31 0.0265 112.62 0.0259 27.82

30 7 21 0.95 0.95 124 0.39 0 74.49 0 19.64

31 8 28 0.90 0.90 208 21.9 0.0211 260.86 0.0161 79.55

32 8 28 0.90 0.90 203 20.37 0 175.06 0 75.37

33 8 28 0.90 0.90 211 140.66 0.0149 198.80 0.0119 79.67 34 8 28 0.90 0.90 291 173.01 0.0204 210.95 0.0108 83.66 35 8 28 0.90 0.90 178 159.34 0.0112 230.70 0 67.34 36 8 28 0.90 0.95 247 10162.53 0.0152 611.28 0.0140 168.79 37 8 28 0.90 0.95 247 15207.83 0.0274 808.94 0.0183 226.08 38 8 28 0.90 0.95 245 12712.21 0.0124 663.99 0.0034 184.31 39 8 28 0.90 0.95 336 9616.80 0.0169 743.39 0.0177 303.50 40 8 28 0.90 0.95 202 9242.10 0.0231 629.13 0.0235 266.47 ¹Over 10 runs.

Table 2.5 Complete results comparing performance and CPU time on 79 test problems.

Problem B+B Section 4 GA Section 5 GA

No N L P Ro Best

42 8 28 0.95 0.95 194 2.69 0.0053 202.57 0.0033 40.56

43 8 28 0.95 0.95 197 26.97 0.0052 173.74 0.0080 58.04

44 8 28 0.95 0.95 276 20.76 0.0133 187.02 0.0100 50.64

45 8 28 0.95 0.95 173 72.78 0.0190 189.02 0.0206 53.51

46 9 36 0.90 0.90 239 8.02 0.0105 324.38 0.0066 98.19

47 9 36 0.90 0.90 191 23.78 0.0277 365.31 0.0081 153.77

48 9 36 0.90 0.90 257 702.05 0.0301 530.37 0.0171 176.79

49 9 36 0.90 0.90 171 0.82 0.0255 292.01 0 81.18

50 9 36 0.90 0.90 198 12.36 0.0228 378.91 0 90.49

51 9 36 0.90 0.95 286 8321.87 0.0821 1215.28 0.0325 404.93 52 9 36 0.90 0.95 220 14259.48 0.0330 998.79 0.0309 358.28 53 9 36 0.90 0.95 306 9900.87 0.0313 1256.82 0.0163 560.89 54 9 36 0.90 0.95 219 17000.04 0.0457 865.38 0.0226 340.13 55 9 36 0.90 0.95 237 7739.99 0.0760 1024.77 0.0778 391.52

56 9 36 0.95 0.95 209 4.95 0.0576 274.83 0 59.24

57 9 36 0.95 0.95 171 21.75 0.0137 293.43 0.0092 99.98

58 9 36 0.95 0.95 233 525.03 0.0375 372.18 0.0268 97.95

59 9 36 0.95 0.95 151 0.99 0.0471 252.71 0 65.78

60 9 36 0.95 0.95 185 25.92 0.0381 385.59 0 71.67

61 10 45 0.90 0.90 131 4623.19 0.0518 1047.60 0.0231 375.14 62 10 45 0.90 0.90 154 2118.75 0.0651 794.83 0.0223 214.63 63 10 45 0.90 0.90 267 1860.74 0.0142 999.01 0.0061 415.53

64 10 45 0.90 0.90 263 1466.73 0.0126 678.02 0 171.04

65 10 45 0.90 0.90 293 2212.70 0.0329 1093.36 0.0182 488.12 66 10 45 0.90 0.95 153 5712.97 0.0257 1718.45 0.0150 982.98 67 10 45 0.90 0.95 197 7728.21 0.0203 1689.51 0.0177 726.31 68 10 45 0.90 0.95 311 8248.16 0.0367 1967.61 0.0136 984.30 69 10 45 0.90 0.95 291 6802.16 0.0404 1529.61 0.0244 825.45 70 10 45 0.90 0.95 358 12221.39 0.0276 2662.34 0.0048 1071.99 71 10 45 0.95 0.95 121 3492.17 0.0563 793.22 0.0124 177.31 72 10 45 0.95 0.95 136 1125.89 0.0291 615.29 0.0185 81.87 73 10 45 0.95 0.95 236 987.64 0.0276 781.68 0.0160 139.53

74 10 45 0.95 0.95 245 2507.89 0.0369 632.11 0 98.31

75 10 45 0.95 0.95 268 1359.91 0.0513 630.37 0.0120 131.55

76 11 55 0.90 0.90 246 59575.49 0.0499 1532.34 0 472.11

NON FULLY CONNECTED NETWORKS

77 14 21 0.90 0.90 1063 23950.01 0.0129 7293.97 0.0079 1672.75 78 16 24 0.90 0.95 1022 131756.43 0.0204 2699.38 0.0185 2334.15

79 20 30 0.95 0.95 596 ² 0.0052 5983.24 0.0152 4458.81

1 Over 10 runs.

2 Optimum solution taken from Jan et al. (1993). CPU time unknown.

Figure 2.3 Example network design for chromosome 0100203102 (section 2.6).

Below is the GA algorithm, followed by a more detailed description of the key steps.

1. Randomly Generate Initial Population

Send initial population to the reliability and cost calculation function and calculate fitness using equation 2.3

Check for initial Best Solution

if no solution is feasible the best infeasible solution is recorded 2. Begin Generational Loop

Select and Breed Parents

copy Best Solution to new population

two distinct parents are chosen using the rank based procedure of Tate and Smith (1995)

children are generated using uniform crossover children are mutated

when enough children are created the parents are replaced by the children

Send new population to the reliability and cost calculation functions, and calculate fitness using equation 2.3

Check for new Best Solution

if no solution is feasible the best infeasible solution is recorded Repeat until gmax generations have elapsed.

Crossover is uniform by randomly taking an allele from one of the parents to form the corresponding allele of the child. This is done for each allele of the chromosome. For example, a potential crossover of parents x1 and x2 is illustrated below.

x1 {0120131011}

x2 {1111012002}



child {0110132001}

After a new child is created it goes through mutation. A solution undergoes mutation according to the percentage of population mutated. For example, if m% = 20% and s = 30, then six members are randomly chosen and mutated. Once a solution is chosen to be

2 1

5 4 x₁₃ = 1 x34 = 1

x₂₃ = 1

x₂₅ = 3

x₄₅ = 2

mutated then the probability of mutation per allele is equal to the mutation rate, rm. So if rm

= 0.3 then each allele will be mutated with probability 0.3. When an allele is mutated its value must change. If an arc was turned off, li,j = 0, then it will be turned on with an equal probability of being turned to any of the states 1 through k–1. If an allele is originally on, then it will either be turned off (k = 0) or it will be turned to one of the different on levels, with equal probability. An example is given below. The solution has been mutated by changing the seventh allele from a 2 to a 0 and changing the ninth allele from a 0 to a 1.

solution {0110132001}

mutated solution {0110130011}

2.6.2 Test Problem 1 – Ten Nodes

The ten node test problem was designed by randomly picking ten sets of (x,y) coordinates and using each of the points as nodes on an 100 by 100 grid. The Euclidean distances between the nodes were calculated, and the unit costs and reliabilities were taken from Table 2.6. The ten node problem was examined with a system reliability requirement of 0.95. Because of the network size, reliability could not be calculated exactly. The Monte Carlo estimator of reliability used both dynamic and static parameters. For the ‘general’

reliability check, which was used on every new population member, the total number of replications used was dynamic. At the first generation, the estimator replicated each network 1000 times (t = 1000). As the number of generations increased, the number of replications used in the general reliability check also increased. After every hundredth generation the number of replications used in the general reliability check was incremented by 1000 (t = t + 1000). This dynamic approach was used so that as search progressed the reliability estimates would improve. Whenever a network was created that met the reliability constraint using the general reliability estimator, and had better cost than the best found so far, a ‘best check’ reliability estimator was employed. This replicated a given system t = 25000 times. This was used to help ensure the feasibility and accuracy of the very best candidate designs.

From initial experimentation s = 90, m% = 25, rc = 1.00, rm = 0.25, rp = 6, and gmax = 1200. Since the problem has 1.24 × 10²⁷ possible designs, it was impossible to enumerate.

So, a random greedy search was used as a comparison. Ten runs of each algorithm using the same set of random number seeds were averaged and plotted as shown in Figure 2.4.

Table 2.6 Arc unit costs and reliabilities for problems in section 2.6.

Connection Type (k) Reliability Unit Cost

not connected, 0 0.00 0

1 0.70 8

2 0.80 10

3 0.90 14

Notice in Figure 2.4 that the GA best cost dips much more rapidly than does the best cost corresponding to the greedy algorithm, indicating that the GA will find good solutions much more efficiently than a myopic approach. Also, both lines appear to be asymptotically approaching a solution, however the line corresponding to the GA is approaching a much better solution than the line corresponding to the greedy search.

Figure 2.4 GA vs. greedy search averaged over 10 runs for the problem of section 2.6.2.

2.6.3 Test Problem 2 – Source-Sink Reliability

This problem demonstrates the flexibility of the GA approach in two respects. First, the calculation of reliability is different. Secondly, the architecture of arcs is restricted; 18 of 36 arcs are unavailable for the network design as shown in Figure 2.5. The GA easily accommodates these rather fundamental changes. The change in the reliability calculation is accomplished by simply modifying the backtracking algorithm of Ball and Van Slyke (1977) – this problem is small enough to calculate reliability exactly during search. The fact that not all possible arcs are allowed is accommodated by simply leaving these out of the chromosome string, as was done in some of the problems of sections 2.4 and 2.5.

This problem is taken from the literature (Kumamoto et al., 1977) and has 6.9×10¹⁰ possible topologies, thus precluding enumeration to identify the optimal design. A system reliability requirement Ro(x) = 0.99 is set. After some initial experimentation it was determined that s = 40, m% = 80, rc = 1.00, rm = 0.05, rp = 6 and gmax = 2000. Seven of the 10 runs found a best cost of 4680. The other three test runs found a best cost of 4726. Since the GA found only two distinct solutions over 10 runs, it is likely that both are near-optimal, if 4680 is not optimal.

In document Desarrollo de una plataforma de datos abiertos para la estaciÃ³n meteorolÃ³gica del Instituto de EnergÃa Solar (página 68-72)