• No se han encontrado resultados

El enfoque de la “doble vía”

MARCO GENERAL DE LAS POLÍTICAS PÚBLICAS PARA

2. El enfoque de la “doble vía”

In this section, we describes our method to compute a set of values used for refinement

of numeric data properties such that: i) theredundant values are eliminated to reduce

the refinement space of the data properties, and ii) the set of values containsnecessary

values to construct the definitions for positive examples. The set of values computed depends upon the sets of examples and their relations to the property values. Therefore,

it is called theadaptive segmentation method.

Figure 4.3 gives an example of an inappropriate segmentation that misses somenec-

essary valuesfor constructing the definitions. The set values for refinement of the prop-

ertyp in this segmentation method is{1,3.5,8,18,30}. However, some values between

4 and 5, and 20 and 28 are needed to construct the definitions of positive examples. If the above values are segmented into 6 parts, the set of values for the segmentation

is {1,2.5,4.5,8,14,24,30} and thus we can get the necessary values. However, this

segmentation producesredundant values, e.g. {2.5,8,14}.

Therefore, to eliminate the redundancy and to avoid missing any necessary values, the approach to segmentation of data properties values requires the information about the relations between the values of the data properties and the examples. These rela-

tions are identified by using arelation graph. A relation graph is a directed graph that

represents the relations between the individuals and the literals based on the assertions in the ABox. The nodes in a relation graph are the class assertions or literals of data properties and the edges are the property assertions that connect their domain value (individuals) to their range value (individuals or literals). Figure 4.4 shows a simple relation graph that describes the relations between some examples and the values of

Figure 4.4: A relation graph of examples and numeric values of the datatype property p. Shaded ellipse nodes are examples, with solid lines representing positive examples and dashed lines representing negative examples. Unshaded ellipse nodes are instances. Rect- angular nodes represent values (literals) of the datatype propertyp. Superscripts+, or

± of a value implies the value has relation with only positive example(s), only negative

example(s), or both positive and negative examples respectively.

instance_01 instance_02 instance_03 example_01

1 2 10+

instance_04

18±

example_02 example_03 example_04 example_05

Property p Object property

Property p Property p Property p

Object property Object property Object property Object property

Given a relation graph, a value is said to be related to an example if there exists a

path from the example to the value. Each value of a datatype property may be related only to the positive or negative example(s) or both types of example. For example, in

Figure 4.4, the literals “1” and “2” of the datatype propertyprelate only to the positive

examples (denoted by +), literal “10” relates only to the negative examples (denoted by –) and the literal “18” relates to both positive and negative examples (denoted by ±).

To segment the values of each datatype property, they are first sorted into a specified order. Then, it is obvious that for the values that have types “+” or “–”, jumping through the values with the same type in the specialisation does not affect the overly general or overly specific property of the refined concept. For example, given the values

for the data property p in Figure 4.3 and an expressionp SOME double[1], then the

specialisation of the expression by increasing 1 to 2, 3 and 4 will not result in an overly specific expression. This property may only be changed when we move to a value with another type, i.e. from 4 to 5. Therefore, these values can be considered as redundant values for specialisation. Only the values at the boundaries of each group are needed.

For the values with “±” type, they cannot alone distinguish the positive examples.

Therefore, no redundancy strategy is proposed for those values. They are segmented value by value.

Figure 4.5: Segmentation of data property values. Values are sorted and then grouped by type. There are 6 segments from s1 to s6 and 7 values are computed for specialisation.

v1 v2 v3 v4+ v5+ v6+ v7+ v8± v9± v10+ v11 v12 v13

s1 s2 s3 s4 s5 s6

Values Segments

Values for spec. (v3 +v4)/2 (v7 +v8)/2

(v8 +v9)/2

(v9+v10)/2

(v11+v12)/2

Figure 4.6: Applying our segmentation method for segmenting the values in Figure 4.3 and computing the values for the specification. There are 3 segments and 4 values computed.

1 2 3 4 5+ 6+ 10+ 12+ 16+ 20+ 28 30 Values

Segments Values for spec.

s1 s2 s3

4.5 24

Finally, the set of values used for the specialisation of each data property is com- puted from the values at the boundary of the segments. They may be the average of the two values at the boundary. Rounding may be needed for integer datatype prop- erties. Figure 4.5 demonstrates a segmentation and the computation of the values for specialisation with 7 values are computed for the specialisation. Figure 4.6 shows gives a particular example for segmenting the values in Figure 4.3. There are 3 segments and only 4 values are computed for the specialisation.

A disadvantage of this strategy in comparison with the fixed-size segmentation strategy is that it requires an extra computational cost for building the relation graph.