F RAMEWORK FOR GEOSPATIAL DATA QUALITY EVALUATION
3.2. Internal matching
propose to use the shape context descriptor developed by Belongie et al. (2002). This descriptor was used in this study as a basis for a similarity measure (see Section 3.1.1.1).
The method to classify the features according its geographic context works as follows:
First we calculate the shape context signature (histogram) for all objects in a dataset (centroid for areas). Then we compare the shape context of a feature against all other shape contexts using the cost function, which vary from 0 (similar) to 1 (dissimilar), and we select the smaller cost. After that we have a list of tuples (object1, object2, minimum cost). Using this list we can classify those objects according our three context classes (uncertain, intermediary, distinct).
In order to define the 'cut-off' values whose determine the context class we prepared an on-line quiz with 50 randomly selected samples and we asked for ten GIS experts to answer it. The aim was to identify cut-off values that reflected the human judgement about geographic context. The experts where oriented to indicate the context class (uncertain, intermediary, or distinct) for each sample. The results of this quiz were used to define the cut-off values. In this study, the experts where invited from academia (Universidad de Jaén, Spain) and industry (Brazilian Army Geographic Service).
3.1.3.5. Factor: systematic disturbance
Systematic disturbance are the fifth controlled factor investigated in this study. The aim is identifying the influence of intentional systematic perturbations over some geospatial data matching procedures. In our method we generated controlled systematic disturbances over original data using an affine transformation, which is able to represent translations, rotations, and scaling (dilation or shear). The systematic disturbance method is detailed in the Section 3.1.2.3. This factor has many levels according to the distinct transformations generated.
3.1.3.6. Factor: random disturbance
The last controlled factor in our design of experiment for geospatial data matching is the random disturbance. The aim is to assess the robustness of matching methods in the presence of controlled random perturbations. In our study we propose a new method to disturb geospatial data using vector fields created for a given standard displacement.
The random disturbance method is detailed in the Section 3.1.2.4. The levels in this factor are straight related to the predetermined standard displacements.
shape context descriptor from Belongie et al. (2002). The proposed method permits to establish the correspondences of vertices from linear or areal data.
The method works as follows: As of other internal matching procedures (Huh et al.
2011, Ruiz et al. 2015) this method requires a previous matching at feature level in order to identify objects' pairs (lines or areas) (Figure 3.13(b)). Any feature matching method can be used, including those that support many-to-many corresponding case (m:n). After determining the features' pairs, the next step is extract the relevant vertices of those objects (Figure 3.13(c)). In this method, we consider relevant vertices those that, considering the anterior and posterior vertices, form an angle greater than a given threshold angth (Figure 3.13(d)).
This list of relevant vertices is used to compound a list of points that will be submitted to a point matching procedure based on the geographic context measure described in Section 3.1.1.1. Each point receives an object identifier (OID) based on the source object (line or area) that originated the point. For points from areas, the OID receives the area's OID, followed by the ring number (0 for exterior ring, and 1..n for internal rings), then the number of vertex inside the ring, beginning with 1, like the SFS specification (Herring 2011). Similar procedure is used for points from lines, where OID receives the line's OID followed by the number of vertex inside the line. The separator between values is the point ('.'). For instance, the OID of a point that is the second vertex in the exterior ring of polygon whose OID is '1234' is '1234.0.2'.
Many times the list of points from compound objects (lines or areas) does not contain enough points to form the context signature of each one. So, these point sets are
Figure 3.13. Internal matching method. (a) Initial datasets. (b) Feature matching. (c) Extracting relevant vertices. (d) Vertices with an angle below the angular threshold are
not used to point matching.
densified in order to reach a pre-determined quantity of points, or a minimum distance between coordinates, which is necessary for the effectiveness of the geographic context measure. Figure 3.14 illustrates how this densification generates sufficient points for the geographic context measure, since this measure uses the number of points in each bin in order to create the context signature of each point.
With the sets of points representing the features’ parts in each dataset, now we can use the geographic context measure to calculate the cost function between point pairs to find the correspondences. In order to increase the precision of this method we shall apply an exclusion criterion after finding the corresponding point pairs. We propose to use the difference between the gradients of point pairs as an exclusion criterion, i.e., if the difference between the gradients reaches a value greater than the threshold gradth, it will not be considered a point pair. Figure 3.15 shows an example of how the gradient threshold can be applied to discard two matches with similar geographic context but in fact they do not represent the same points pairs.
Figure 3.14. Densification of points inside an area feature in order to obtain points to compose the context signature of the relevant points, which are submitted to the internal
matching procedure.
In this framework for quality evaluation using web services we are primarily adopting the quality model described in the Brazilian standard CQDG (DCT 2016a). This quality model provides a point-based method in order to control the positional quality. Thus, an internal matching procedure becomes a crucial element of this framework.