5 IDENTIFICACIÓN DE LOS EFECTOS POTENCIALES
5.3 Descripción y valoración de los efectos potenciales
5.3.1 Efectos potenciales sobre el medio físico
Observed communities evaluation results exposed effects of the spatial properties of observation data on similarity values (section 5.1.5). As the OSM environments approach uses identical similarity indi- ces, these effects are also present in similarity values with the OSM environments approach. However, with OSM data as an extrinsic source of geographic context, there are important consequences con- cerning the interpretation of these effects when using similarity values as plausibility indicators. a) ArtenFinder, Simpson Index vs. candidate
OSM context size, Spearman’s Rho 0.73
(p-value: < 2.2*10-16), n = 49,798
b) iNaturalist, Simpson Index vs. candidate OSM context size, Spearman’s Rho 0.63
(p-value: < 2.2*10-16), n = 76,675
c) ArtenFinder, Simpson Index vs. OSM environ- ment size, Spearman’s Rho 0.16
(p-value: < 2.2*10-16), n = 49,798
d) iNaturalist, Simpson Index vs. OSM environ- ment size, Spearman’s Rho 0.04
(p-value: < 2.2*10-16), n = 76,675
Figure 5.2.2: ArtenFinder and iNaturalist data, Simpson index values vs. candidate OSM context size and OSM environment size (no. of tags).
a) ArtenFinder, Jaccard Index vs. candidate OSM context size, Spearman’s Rho: 0.43
(p-value: < 2.2*10-16), n = 49,798
b) iNaturalist, Jaccard Index vs. candidate OSM context size, Spearman’s Rho 0.09
(p-value: < 2.2*10-16), n = 76,675
c) ArtenFinder, Jaccard Index vs. OSM environ- ment size, Spearman’s Rho 0.46
(p-value: < 2.2*10-16), n = 49,798
d) iNaturalist, Jaccard Index vs. OSM environ- ment size, Spearman’s Rho 0.39
(p-value: < 2.2*10-16), n = 76,675
Figure 5.2.3: ArtenFinder and iNaturalist data, Jaccard index values vs. candidate OSM context size and OSM environment size (no. of tags).
Simpson similarity values show a positive correlation with numbers of tags in candidate OSM contexts (Figure 5.2.2). If a candidate observation is situated in a place where many different OSM tags can be found in the relevant neighborhood, this raises chances for a higher similarity value with the target species’ OSM environment. Such observations have higher chance to appear more plausible in light of Simpson similarity. There is no pronounced correlation of Simpson index values with OSM environ- ment size. Correlation of candidate OSM context size with ArtenFinder Jaccard index values is lower than with Simpson index values, and there is a correlation of Jaccard index values with OSM envi- ronments size on about the same level (Figure 5.2.3). Jaccard index values with iNaturalist data are only very weakly associated with candidate OSM context size. However, the findings here are basical- ly analog to results obtained with the observed communities approach. The Simpson index correlates
with candidate context size, which is due to the asymmetry of sizes of OSM environments and candi- date context, the former being usually smaller, in 74.1% of ArtenFinder cases, and in 82.5% of iNatu- ralist cases. The Jaccard index, using the union of OSM environment and candidate context as its de- nominator, shows weaker correlations with both candidate context size and OSM environment size. What do these findings mean for the use of similarity index values as indicators of plausibility of a casual citizen science observation if OSM provides geographic context, instead of observed communi- ties? Observation data of organisms and OSM data come from completely separate projects. Although the basic factors governing data collection are similar to a high degree (as was demonstrated in section 2.2), the actual data collection processes are not connected to one another. In the observed communi- ties case, a candidate observation which is located in a situation with a high observation density around it (which may lead to a larger candidate context, raising the probability of a higher Simpson similarity) very properly has a higher probability of a higher plausibility estimation, because it is situ- ated in a place where it is per se more likely for an observation to come from. However, this rationale does not hold if an extrinsic source of geographic context is involved, whose spatial properties are governed by completely independent processes. If a candidate observation is placed in a situation which renders a large number of tags, this will raise the probability of a higher plausibility evaluation for the candidate, which may lead to a bias in plausibility estimation based on the Simpson index. The Jaccard index, with its weaker association between index values and candidate OSM context size, suffers less from this effect, and seems to be, from this perspective, a more suitable index for plausibil- ity estimation for casual citizen science observations of organisms with the OSM environments ap- proach. However, it also does show positive correlation with candidate OSM context size and with OSM environment size.
Detailed analysis of evaluation results with the observed communities approach suggested observation density around candidate observations as a factor influencing candidate context size and observed community size (see section 5.1.5). An analog factor cannot, however, be determined in the OSM environments approach. There is no proper spatial “tag density” in OSM data which would be compa- rable to observation density. The OSM environments approach uses the tag information attached to OSM objects to characterize geographic context. There is an n to n relationship between OSM objects and tags, which was already discussed when OSM data use cases were described in this work (see section 2.1.3): an object may (and usually does) carry several tags, and an object may also be seg- mented into several parts all carrying the same tag or tags. Instead of using spatial object density, a possible substitute might be OSM information density, that is, the number of different tags found around an observation (including nonspecific tags). For the reasons reiterated above, this parameter was already used to characterize the spatial properties of OSM data within the respective areas of in- terest (see section 2.1.3).
A candidate OSM context is, of course, basically the list of different OSM tags found around a candi- date observation, from which only the nonspecific tags were removed. Unsurprisingly, candidate OSM context size therefore correlates to a high degree with OSM information density around a candidate (see Figure 5.2.4). Therefore, Simpson similarity has a higher probability of a higher value if a candi- date observation is situated in a neighborhood with a high number of different OSM tags.
a) ArtenFinder, Spearman’s Rho 0.99
(p-value: < 2.2*10-16), n = 49,798
b) iNaturalist, Spearman’s Rho 1.00
(p-value: < 2.2*10-16), n = 76,675
Figure 5.2.4: ArtenFinder and iNaturalist data, candidate OSM context size (no. of tags) vs. no. of surrounding tags (OSM information density).
What are the factors which influence OSM environment size? Here similar parameters come to mind which were already examined for the observed communities approach. On the one hand, this is OSM information density around observations of target species: more diverse OSM contexts around target species observations (which are used for OSM environment extraction) may lead to larger OSM envi- ronments. On the other hand, there is the number of observations of target species used for OSM envi- ronment extraction: more target species observations represent more OSM context situations, which might lead to larger OSM environments. In contrast to findings with the observed communities ap- proach, Spearman’s Rho does not measure a pronounced correlation between OSM information densi- ty (expressed in the mean number of different tags found around a target species’ observations) and OSM environments size, see Figure 5.2.5. Differences in OSM environment sizes are therefore proba- bly not influenced by spatial differences in how many different tags can be found around observations of a target species used for OSM environment extraction. A candidate observation of a target species whose earlier observations (used for OSM environment extraction) are predominantly situated in plac- es with many different tags around them is not likely to appear more plausible when the Jaccard index is used as a plausibility indicator. Analog to the observed communities approach, the sizes of OSM environments also show no pronounced correlation (on the p ≤ 0.05 level) with the number of observa- tions used for extracting these OSM environments (Figure 5.2.6).
a) ArtenFinder, size of OSM environments vs. mean numbers of different tags surrounding ob- servations used for OSM environment extraction,
Spearman’s Rho -0.02 (p-value 0.6187),
n = 402
b) iNaturalist, size of OSM environments vs. mean numbers of different tags surrounding ob- servations used for OSM environment extraction, Spearman’s Rho 0.12 (p-value 0.003051), n = 635
Figure 5.2.5: ArtenFinder and iNaturalist data, OSM environment size (no. of tags) vs. mean nos. of surrounding tags of target species observations up to 2015 (observations used for observed communi- ty extraction).
b) ArtenFinder, size of OSM environments vs. observation numbers of target species, Spear- man’s Rho -0.10 (p-value 0.04919),
n = 402
d) iNaturalist, size of OSM environments vs. ob- servation numbers of target species, Spearman’s
Rho -0.25 (p-value 1.627*10-10),
n = 635
Figure 5.2.6: ArtenFinder and iNaturalist data, OSM environment size (no. of tags) vs. observation numbers of target species up to 2015 (observations used for observed community extraction).