• No se han encontrado resultados

Evaluation results for the observed communities approach with iNaturalist data are comparable, in all respects, to results obtained with ArtenFinder data. Figure 4.1.4 shows the distributions of Simpson similarity index values with iNaturalist data, and Table 4.1.6 presents numeric results of the statistical test used for assessing differences between distributions of similarity values of plausible or implausi- ble observations. The data exhibit significant differences in distributions of similarity values between approved or synthetic plausible observations (iNat_A and iNat_SP) and the sets of synthetic implausi- ble observations (iNat_SI1, iNat_SI2 and iNat_SI3). They allow for accepting the alternative hypothe- sis (p ≤ 0.05) of the Mann-Whitney-U-Test for all comparisons of sets. According to this evaluation, distributions of Simpson index values of accepted and synthetic plausible candidate observations are statistically different from distributions of Simpson index values of synthetic implausible observations. The same is true for Jaccard index values, see Figure 4.1.4, and Table 4.1.7. Variances of the distribu- tions which were compared were found to be not homogeneous by the Fligner-Killeen-Test (see sec- tion 3.3.1 for details).

a) iNaturalist, Simpson index b) iNaturalist, Jaccard index

Figure 4.1.4: iNaturalist, distributions of Simpson and Jaccard similarity index values, observed communities approach. n(iNat_A) = 34,821; n(iNat_SP) = 2,415; n(iNat_SI1) = 2,216; n(iNat_SI2) = 34,485; n(iNat_SI3) = 4,768.

Table 4.1.6: iNaturalist, results (p-Values) of Fligner-Killeen-Tests and of Mann-Whitney-U-Tests with iNat_A, iNat_SP and the three different sets of synthetic implausible candidate observations, Simpson index, observed communities approach.

Simpson Index, iNat_A vs. iNat_SI1 iNat_SI2 iNat_SI3

Fligner-Killeen-Test 0.0002067 < 2.2*10-16 < 2.2*10-16

Mann-Whitney-U-Test < 2.2*10-16 < 2.2*10-16 < 2.2*10-16

Simpson Index, iNat_SP vs. iNat_SI1 iNat_SI2 iNat_SI3

Fligner-Killeen-Test < 2.2*10-16 < 2.2*10-16 < 2.2*10-16

Table 4.1.7: iNaturalist, results (p-Values) of Fligner-Killeen-Tests and of Mann-Whitney-U-Tests with iNat_A, iNat_SP and the three different sets of synthetic implausible candidate observations, Jaccard index, observed communities approach.

Jaccard Index, iNat_A vs. iNat_SI1 iNat_SI2 iNat_SI3

Fligner-Killeen-Test < 2.2*10-16 < 2.2*10-16 < 2.2*10-16

Mann-Whitney-U-Test < 2.2*10-16 < 2.2*10-16 < 2.2*10-16

Jaccard Index, iNat_SP vs. iNat_SI1 iNat_SI2 iNat_SI3

Fligner-Killeen-Test < 2.2*10-16 < 2.2*10-16 < 2.2*10-16

Mann-Whitney-U-Test < 2.2*10-16 < 2.2*10-16 < 2.2*10-16

Properties of Observed communities

iNaturalist data from California up to 2015 have 549 species with 100 or more research grade observa- tions. Filtering for frequently associated species (with a threshold of 0.5) leaves 484 non-zero ob- served communities and 234 observed communities with 10 or more species (ranging from 10 to 534 species, mean: 86.9 species per observed community). A threshold of 0.5 for finding and eliminating nonspecific species results in no nonspecific species in the observed communities for this dataset. This means that no species is present in 50% or more of observed communities. Compared to the Arten- Finder data, this points to higher species diversity in the observed communities, already visible in the much higher number of species represented in the data, and in the higher mean number of species in observed communities. This is certainly to be expected in a much larger and much more ecologically diverse area of interest. To keep the methodology comparable to the evaluation with ArtenFinder data, the threshold for nonspecific species was not changed to a lower value. Table 4.1.8 provides an over- view of the numbers cited above.

Table 4.1.8: iNaturalist, key numbers describing valid observed communities. No. of valid observed com-

munities

Mean no. of species in ob- served communities

No. of nonspecific species

234 86.9 0

41.0% of valid observed communities are of birds, followed by plants (29.5%) and mollusks (mostly marine mollusks, 14.5% of observed communities). The target species of the valid observed communi- ties represent 41,576 of the accepted candidate observations (iNat_A), or roughly one fourth of these candidates.

Properties of sets of candidate observations

Some key parameters characterizing the sets of valid candidate observations resulting from the evalua- tion with iNaturalist data are presented in Table 4.1.9. Source sets having different sizes (see section 3.3.2), sets of valid candidates are also different in size. iNat_SP has, on average, the largest candidate contexts, as well as the largest observed communities associated to these cases. Also, its candidates are situated in locations with very high observation densities. iNat_SI1, iNat_SI2 and iNat_SI3 all have lower values for mean numbers of species in candidate contexts, and of context observations, if com- pared to the average iNat_A case (iNat_SI3 extremely so). iNat_SI1 also has a lower number of spe- cies in observed communities associated to its cases.

Table 4.1.9: iNaturalist, key numbers describing sets of valid candidate observations, observed com- munities approach.

Set of candidates

No. of valid can- didate cases

Mean no. of spe- cies in candidate contexts

Mean no. of spe- cies in observed communities, per

set of candidates

Mean no. of con- text obs. iNat_A 34,821 132.3 65.2 753.1 iNat_SP 2,415 316.9 260.1 3,957.5 iNat_SI1 2,216 106.4 23.7 377.3 iNat_SI2 34,485 127.6 65.5 487.9 iNat_SI3 4,768 34.4 71.8 56.0

Looking at species group compositions of the sets of valid candidate observations (that is, observa- tions actually used in the evaluation, Table 4.1.10), iNat_A, iNat_SI2 and iNat_SI3 have again very similar species group compositions, which is due to the method used for creating iNat_SI2 and iNat_SI3. Selection of valid research-grade candidate observations in the process of evaluation brought a pronounced change to the thematic properties of iNat_A: they are now clearly dominated by bird observations, which make up about half of the valid candidate observations. Plants now rank only second, followed by mollusks, butterflies, and mammals. The class of “other species” (species not assignable to any of the groups used in these data) holds ca. 6-7% of valid candidates in this set and in iNat_SI2 and iNat_SI3. Reptiles, beetles and others make up the remaining candidates. Set iNat_SI1, containing only species selected for their special properties (physically similar species living in differ- ent habitats) contains mostly birds as well as some plant species. iNat_SP is dominated by mollusks, followed by birds, plants and “other species”.

Table 4.1.10: iNaturalist, portions of species groups in sets of valid candidate observations, observed communities approach.

iNat_A iNat_SP iNat_SI1 Nat_SI2 iNat_SI3

Species group (%) (%) (%) (%) (%) plants 22.7 14.0 1.6 22.7 21.9 mammals 2.0 1.9 0.0 2.0 2.1 birds 51.3 31.7 98.4 51.3 48.9 reptiles 0.8 1.1 0.0 0.7 0.6 butterflies and moths 2.7 0.0 0.0 2.7 3.1 hymenopterans 0.1 0.0 0.0 0.1 0.1 beetles 1.3 0.0 0.0 1.3 1.5 dragonflies and damselflies 0.8 0.0 0.0 0.8 0.8 crustaceans 1.3 0.0 0.0 1.3 1.6 mollusks 10.7 43.4 0.0 10.7 12.5 other species 6.1 7.2 0.0 6.1 6.7 spiders 0.2 0.0 0.0 0.2 0.3

a) iNat_A (n = 34,821) b) iNat_SP (n = 2,415)

Figure 4.1.5: Spatial distribution of valid candidate observations in sets of research grade and syn- thetic plausible iNaturalist candidates, observed communities approach. (No. of points in 20x20 km raster. Classified by Natural Breaks. Source of state line: U.S. Geological Survey 2016.)

Figure 4.1.5 and Figure 4.1.6 show that the spatial distribution remains similar in all sets of valid can- didate observations to the distribution in the original dataset. It can also be seen that valid research grade candidate observations in this experiment (set iNat_A) concentrate stronger in the San Francisco Bay area, than do the original research grade candidate observations from 2016. iNat_SP shows stronger concentration of observations to the high-observation-density areas, while iNat_SI3 observa- tions are more dispersed than in the other sets. Both effects are caused by the method of production of these two synthetic sets (see section 3.3.2). In iNat_SP, candidates are located close to high- plausibility observations from iNat_A, which are predominantly found in high-observation-density regions. On the contrary, placing iNat_SI3 candidates away from known observations of their target species pushes them away from existing clusters of observations. However, they still concentrate in regions with relatively high observation densities, because valid candidate cases need a sufficient number of context observations to produce valid candidate contexts with 10 or more species.

a) iNat_SI1 (n = 2,216) b) iNat_SI2 (n = 34,485)

c) iNat_SI3 (n = 4,768)

Figure 4.1.6: Spatial distribution of valid candidate observations in sets of synthetic implausible iNat- uralist candidates, observed communities approach. (No. of points in 20x20 km raster. Classified by Natural Breaks. Source of state line: U.S. Geological Survey 2016.)

Documento similar