CONTENIDO DE HUMEDAD (D2216) - ENSAYOS DE LA MUESTRA PARA DETERMINAR LAS PROPIEDADES

3.5. ENSAYOS DE LA MUESTRA PARA DETERMINAR LAS PROPIEDADES

3.6.1. CONTENIDO DE HUMEDAD (D2216)

Problem representation is one of the key factors for the success of a genetic algorithm. A great deal of study has been conducted in this area, but the issue is still a matter of controversy in the GA community. There are two main groups: those who defend Holland’s original binary representation and those who argue that a GA should use the most appropriate representation according to the problem’s characteristics and requirements. The latter approach accommodates binary and non-binary (e.g. real value) representations, and are becoming more popular with the

Chapter 2____________________ Genetic Algorithms Revisited___________________________ 31^

use of G As to tackle increasingly complex problems. This section presents both approaches to stress the importance of supporting them in a general-purpose programming environment.

Binary representation, as originally proposed by Holland, uses the low est possible cardinality. Holland demonstrated, through the Schema theorem (see Figure 2.5), that fixed- length strings of low cardinality alphabets provide an effective means to search a solution space. Since genetic algorithms work essentially by promoting recombinations of genetic material, the binary representation provides the largest possible number of building blocks, or schemata (per population member), to be recombined. The most common encoding method maps the original representation of a problem variable - say an integer - into its binary value. This encoding method, however, is very sensitive to the mutation operator since the effects of this operator are very distinct, depending on the position of the bit being modified. An alteration in a high-order bit produces large variations on the original value, whereas modifications in low-order bits produce only slight variations. This tends to be a problem in the later stages of a simulation, when the population of solutions is closer to the best value. Many researchers adopted Gray encoding methods to overcome this problem. Gray encoding has the property that the binary coding of adjacent values, in the original representation, differs in only one bit. For instance the Gray representation for 7 is 0100 while 8 is coded as 1100. Some researchers [85] adopted this method as standard in their G As, after performance improvements on the five De Jong [25] test suites were reported in the literature. Goldberg, however, states that Gray encoding may reduce the degree of implicit parallelism as demonstrated by the Schema theorem for the standard binary encoding. Other forms of encoding have also been proposed including adaptive techniques, such as dynamic parameter encoding (DPE) by Schraudolph and Belew [84], which addresses the problem of representing floating-point numbers in a fixed length string, without loosing precision.

Binary strings have been shown to be capable of usefully encoding a wide variety of information. The properties of binary representations for genetic algorithms have been extensively studied, and a good deal is known about the genetic operators and parameters that work well with them [23].

Real Value representation, on the other hand, emerged from the increasingly complex p ro b le m s b e in g a d d re s s e d by G A s. It is also th e s ta n d a rd re p r e s e n ta tio n used by Evolutionsstrategie-based algorithms, and is the only means to effectively represent genetic programming problems. GAs are extremely sensitive to different mappings, which added to the overhead imposed by the string encoding and decoding process, may prevent their use on a large number of real-world optimisation problems. Many industrial problems, for instance, already using traditional techniques, have a well-defined representation and evaluation function. In most cases, these functions would impose constraints on other representations, severely compromising the performance of any GA.

Figure 2.5 - The Schema Theorem

Holland introduced the notion of a schema as a collection of genomes that share certain gene values, or alleles. For instance, the schema 1**0* represents the set of chromosomes with a 1 at the first locus and a 0 at the fourth locus. A schema group, or schemata, provides a way of describing underlying similarities between successful strings. Since each schema is likely to be represented by many strings in the population, it is possible to work out an average score for each. Such explicit calculation is, nevertheless, unnecessary as the process is automatically handled by the selection o f whole strings. The net result of the selection process is the increase of the number of strings containing good schemata in the population. Therefore, the number of copies of a schema in the next generation N g (^g + \) can be expressed in terms of its current number of copies its fitness the average fitness of the whole population / , and the probability that the schema survives the genetic operators p { S ). It is given by the following expression:

Ns<g + l ) > N s ( g ) ^ p { S )

Before expanding p ( S ) into terms that express the influence o f the m utation and crossover operators, it is necessary to define the order ( ) of a schema and its defining length (d ^ ). The order of a schema is given by the number of bits that are not represented by the ‘*’ symbol. The defining length gives the distance between two extreme bits that define the schema - note that ‘*’ is not allowed at the extremes. Therefore, the schema 0**01 *1 has =4 and d^ =6.

A schema with defining length d^ would be destroyed by the crossover operator (assuming uniform distribution for choosing any point along the string length I) if the crossover point falls between the two extremes of the schema. Hence, the probability to survive crossover is given by:

‘

0-0

Where:

p^ = probabihty of applying the crossover operator

d^ = schema defining length

(/ — l ) = possible positions for a crossover site

If p ^ is the probability of applying the mutation operator, then the probability that a schema of order survives mutation (for the typical low values of p ^ ) is:

The Schema Theorem can then be finally re-written as follows:

fs

Chapter 2____________________ Genetic Algorithms Revisited___________________________ 33

Other problems, specially those using genetic programming, cannot be easily represented with flat, fixed-length, strings. In general, more complex representations are required involving m ixed types of data (integers, floating-point, and even binary). Representations used in GP problems, for instance, invariably assume a tree configuration, which has its width and depth modified during simulation, expanding and shrinking many times. Therefore, Radcliffe [69] and others consider that “if there is no benefit to be gained from changing to a special genetic representation, it would seem perverse to do so”.

From the theoretical point o f view , A ntonisse [2], R adcliffe [68] and V ose [97], dem on strated that extensions o f the Schem a theorem can also be applied to real value representation. Their arguments are based on the fact that GAs’ search is guided by the quality of the information it collects about the space (through observed schema fitness averages in the population). They have contributed to the demolition of strong beliefs that only low cardinality (binary) representations offer intrinsic parallelism.

One of the major disadvantages of real value representations is the necessity to define representation dependent genetic operators. This, however, does not seem to be considered a problem by most of the people using this method, since the majority of problems already require genetic operators specially designed to cope with their complexities.

The conclusion o f the above discussion is that a general-purpose program m ing environment must support, at least, binary and some real value representations to satisfy the requirements of diverse problems, using any of the three major evolutionary techniques. It would be even m ore interesting, how ever, if the genetic m anipulations could take place over a representation-independent data structure. This would allow the user to choose freely the most suitable representation for a problem, and implement problem-independent genetic algorithms and operators.

In document Mejoramiento de la resistencia al corte de suelos finos utilizando la técnica de electroósmosis (página 54-59)