Quality evaluation module - F RAMEWORK FOR GEOSPATIAL DATA QUALITY EVALUATION

F RAMEWORK FOR GEOSPATIAL DATA QUALITY EVALUATION

3.3. Quality evaluation module

In this framework for quality evaluation using web services we are primarily adopting the quality model described in the Brazilian standard CQDG (DCT 2016a). This quality model provides a point-based method in order to control the positional quality. Thus, an internal matching procedure becomes a crucial element of this framework.

Table 3.1. Quality evaluation procedures considered in the evaluation tier.

Method

type Sampling Scope

Quality

element Measure

211 Internal Full

inspection All points Topological

consistency Rate of invalid points (CQDG:211) 212 Internal Full

inspection

All lines Topological consistency

Rate of invalid simple lines (CQDG:212)

213 Internal Full inspection

All areas Topological consistency

Rate of invalid polygons (CQDG:213) 215 Internal Full

inspection All areas Topological

consistency Rate of invalid overlaps (CQDG:215) 101 External Sampling Dataset Commission Rate of excess items

(CQDG:101) 103 External Sampling Dataset Omission Rate of missing items

(CQDG:103) 301 External Sampling Dataset Positional

accuracy

Planimetry evaluation (CQDG:301) 302 External Sampling Dataset Positional

accuracy

Altimetry evaluation (CQDG:302)

The selected internal procedures refer to topological consistency. An on-line version of evaluation methods in this quality element has been implemented in some data integration projects (e.g. Tagg 2015). The four evaluation procedures considered (ID 211-215) do not require any sampling, so a full inspection is performed. The first three procedures (ID 211-213) refer to the validity of geometries in relation to the Simple Features Standard (SFS) (Herring 2011), with the difference that a line should be a simple line (without self-intersections). The fourth procedure refers to find overlaps between areas in the same layer, which may represent an error according to the data specification.

There are two direct external evaluation procedures (ID 101, 103) that refer to completeness: one procedure for commission and other for omission. Methods of this quality category require an object sampling based on a tesselation of test data according as its scale (4 cm in the test data scale). There are three sampling strategies:

isolated lot (ISO 2859-2), lot-by-lot (ISO 2859-1), and full inspection (100%). The ISO strategies are provided in the quality model (DCT 2016a), and the full inspection was added as a third option.

The object sampling is implemented as described in the quality model: The first step is determine the sample size according to the sampling strategy. The second step is to create a tesselation in test data according its scale (Figure 3.16(b)). Then the cells of this tesselation are randomly selected and all objects inside each cell are computed till reach the initial sample size (Figure 3.16(c)). When the test sample done, we use the same cells to find the reference sample in the reference dataset (Figure 3.16(d)). With the sampling done, the method calls the feature matching module and finds the

matching pairs between test and reference. For the quality evaluation, it calculates the rate of excess items (commission) and the rate of missing items (omission) (Figure 3.16(e)).

The last external evaluation procedures (ID 301, 302) refer to positional accuracy. The quality model (DCT 2016a) provides different methods to assess the planimetric and altimetric quality using points. These methods return a quality category named PEC (Padrão de Exatidão Cartográfica – cartographic accuracy standard) according to the 90% percentile of errors and the corresponding root mean square. The PEC can assume five values: A, B, C, D and nonconforming (or '0'). As a point-based method, in order to use linear or areal features it is necessary to use the internal matching as described in Section 3.2. The matching should occurs prior to the sampling to avoid selecting unmatched points at sampling phase. These quality methods require a positional sampling procedure that uses a tesselation over test data according to the Figure 3.16. Completeness evaluation procedure. (a) Test data. (b) Creating cells over

test data using the scale. (c) Cells are randomly selected, all objects are included. (d) Reference data within sample cells are used to compare the test dataset. (e) Quality

measures.

test data scale, similar to that used in object sampling. The population which is applied the sampling procedure are the cells in this tesselation that have at least one point to be assessed. There are four sampling strategies: isolated lot (ISO 2859-2), lot-by-lot (ISO 2859-1), one-by-cell, and full inspection of points. The ISO strategies are provided in the norm, while one-by-cell means full inspection of cells, i.e., all cells with points should be used. The last strategy, full inspection of points, is applied to consider all available points in the positional accuracy assessment.

Figure 3.17 illustrates an example of how works the quality evaluation for positional accuracy considering an ISO's sampling strategy.

Following Figure 3.17(a), the first step is to create a tesselation in test data according its scale, which has a resolution of 4 cm in the scale of assessed data (DCT 2016a). The second step is to determine the sample size according to the sampling strategy taking into account the number of cells with points. This sample size represents the number of Figure 3.17. Positional accuracy evaluation procedure. (a) Creating cells over test data.

(b) Cells are randomly selected according to sample size. (c) In each selected cell, one point is randomly selected as sample. (d) Reference data are used to compare the

samples. (e) Quality measures.

cells that will be considered in the evaluation. Then the cells of this tesselation are randomly selected according to the sample size (Figure 3.17(b)). In the following, for each selected cell, the system randomly select one point inside the cell as a sample. At the end we have a list of sample size points that will be compared with reference data (Figure 3.17(d)). Finally, the system calculates the values of 90% percentile for planimetric errors and altimetric errors (when available) and the correspondent root mean square. These two values are used to determine the quality category (PEC) of each test dataset.

In document Automatic evaluation of geospatial data quality using web services (página 65-69)