COMPARACIÓN CON LOS ESTANDARES REQUERIDOS POR EL SIGUIENTE ESLABÓN
4.1 DATOS DE INDUSTRIALIZADORES ACTUALES ESTATALES
No single measure exists for establishing the credibility of VGI volunteers: this is another challenge in assessing VGI quality and a hindrance for its consideration in authoritative systems. If the information is attributable to a known source by a consumer, it is likely to be trusted more, and have higher reliability and quality, than from the majority of VGI, mostly produced by unknown volunteers. According to Antoniou and Skopeliti (2015), the emphasis of VGI quality determination has been on the characterisation of contributed geospatial datasets, with less emphasis on volunteer credibility.
Statistical methods can be used to analyse and model the relationship between volunteers and their contributions. To establish the credibility of volunteers, a Latent Class Analysis (LCA) methodology is proposed. LCA is recognized as an effective methodology to analyse trends and qualities of multiple contributions from volunteers (Huang and Bandeen-Roche, 2004), and has been widely used to assess the accuracy of volunteers in land cover maps (Foody and Boyd, 2012; Foody et al., 2013). It takes observed variables provided by volunteers to compute information on the unobserved (latent) variable, here representing volunteer reputation. Moreover, LCA can be used to evaluate diagnostic tests without reference to validation by ground truth. The contributions from volunteers were compared against consensus-based classification values of trusted intermediaries via cross-tabulation, to represent final land parcel tags used as input for volunteer reputation computation in Mplus Statistical Analysis software (Muthén, 2004; Jung and Wickrama, 2008). LCA requires that each observed entity be statistically independent of other variables. Foody et al. (2013) used latent class models to measure the accuracy of four volunteers in labelling tropical forests in a ‘Globcover’ map in West Africa, extracting information on the quality of contributed datasets to establish contributor accuracies in the map without reference to ground truth.
A standard latent class model can be constructed based on the probability of observing patterns of class allocations by a series of classifiers applied to a dataset (Foody, 2012). These class
126
allocations are known as observed variables (here, land parcel classifications), and are used to provide information on the unobserved variable (which equates to the volunteer reputation). Volunteer reputation has been established using Bayes theorem, which describes the probability of an event happening based on prior knowledge of conditions related to it (Vermunt and Magidson, 2003). For example, a person’s ability to correctly identify and classify several land parcel parameters can be used to represent a reputation category to which they belong. Therefore, a volunteer’s reputation derived from Bayes theorem was allocated to the class which displayed the highest posterior probability of class membership (Vermunt and Magidson, 2003; Foody, 2012).
In this study, LCA was used to estimate the reputation of volunteers based on their multiple classifications (land use, occupancy and development status) of different land parcels. To achieve this, experts can: a) be engaged in assessing and rating how well volunteers correctly classified land parcels in the study area, or b) undertake a tagging activity independently and their outputs aggregated to be used as a reference to determine how well volunteers correctly classified land parcel parameters. The latter was adopted in this study. Volunteers with good reputations are characterised by producing geospatial data of good quality. An advantage of LCA is that it can be used to characterise the accuracy of each contributor’s labelling regardless of the number of contributions made (Foody et al., 2015). One of the main issues with LCA is determining the number of classes and statistically assessing the fit of each class to the data to obtain representative results (Nylund et al., 2007). A four-class model was selected to compute volunteer reputations: it means that there are four categories that volunteers can fall into based on the accuracy of their classifications. For example, a volunteer with the most correct classifications would belong to a very good reputation category, while the opposite would belong to a very bad reputation category.
Entropy in LCA is used to examine the model fit of how well individuals are assigned to membership (reputation) categories. An entropy value close to one shows a good model fit and a clear separation of categories (Nylund et al., 2007; Jung and Wickrama, 2008). In short, this methodology investigates volunteer reputation by examining how well a volunteer correctly classifies 30 pre-defined land parcels. These labels were later compared with aggregated experts’ labels through cross-tabulation to determine the accuracy of the participant’s classification. LCA then analyses the trends of the cross-tabulation results (observed variables), to compute the reputation of the volunteers (unobserved variable) using Bayes Theorem. Volunteer reputation works on data provenance: as such, establishing 30 pre-defined land
127
parcels would facilitate a sufficient investigation and computation of how well participants correctly classify objects of different land uses in the study area (Foody et al., 2013).
Implementation of the Latent Class Analysis in the study area
Volunteer reputation enforces the reputation element of TRM. Latent Class Analysis (LCA) methodology uses multiple contributions of an individual to infer the quality and reliability of the data they produce. It takes observed variables provided by volunteers to compute information on the unobserved (latent) variable, here representing volunteer reputation, using Bayes Theorem. Initially, contributions from volunteers were compared against consensus- based classification values of trusted intermediaries via cross-tabulation, to represent final land parcel tags. The obtained tags were then used as input for volunteer reputation computation in Mplus Statistical Analysis software. The basis of LCA is that a set of observed labels derived from volunteers’ contributions convey information on the true label of the unobserved (latent) variable. Therefore, the methodology identifies reputation classes (very good, good, bad, and very bad) to which participants belong, based on the accuracy of their classifications.
A user’s class was evaluated by computing entropy, obtained through an iterative process in Mplus. Entropy in LCA is used to examine the model fit of how well individuals are assigned to membership (reputation) classes (the scale of entropy ranges from 0 to 1). An entropy value close to 1 shows good model fit and a clear separation of classes (Nylund et al., 2007; Jung and Wickrama, 2008): In this study, the highest entropy value of 0.986 was achieved from the iterative process, when four classes were selected to determine model fit. Therefore, a four- class model was adopted. A four-class model of Green, Yellow, Orange and Red was selected to distinguish reputation classes that volunteers belong to, based on how well they correctly classified land parcel parameters. These four classes were allocated reputation categories as follows: Green – very good, Yellow – good, Orange – poor and Red – very poor. For example, a cross-tabulation of participant 4’s contributions against consensus-based classifications revealed that they belong to the ‘Yellow’ class since they correctly classified an above average proportion of land parcels: the ‘good’ reputation category. In short, reputation categories of ‘very good’ and ‘good’ depict trustworthiness, which can be used by participants to establish credibility in interacting with others in participatory initiatives like VGI.
A Covariance structure analysis methodology can be used to evaluate a user’s class (van Hell et al., 1996). According to Ployhart and Oswald (2004) it is a methodology that tests the
128
precision with which a class can reproduce sample covariances assessed using fit functions, to measure the overall goodness of fit of the model to the observed data. In short, the methodology seeks to outline the relationship among a set of observed variables in terms of unobserved or latent variables.