Teoremas de completitud semántica y sus primeras consecuencias

All GIS procedures were performed in MapInfo Professional 7.5. The vegetation files were combined into one vegetation map of the North Pennines, then a 1 km2 grid file was created. Within each square the following were calculated; the area of each raw and of each broad vegetation type, and measures of patchiness (number of broad vegetation types, percent cover of non-dominant types and edge distance between raw types). The first two of these are highly correlated so the percent cover of non-

dominant types was removed from subsequent analyses. Management data were incorporated into these grid files, so each cell included information about the area shot over, the area enrolled in a government agri-environment scheme, and the grazing level (high, medium or low). High grazing represents high intensity for the North Pennines but not necessarily relative to other areas.

This grid file was used as the highest resolution grid, vegetation. With further

treatment it provided grids at the approximate home-range of species for which this is known to be larger than 1km2 (buffer). A grid was also produced with 4km2 pixel sizes to represent farm units (farms) (see Chapter 3), and finally, a grid was produced with the 1km2 grid size but with moving window averages used as the vegetation and management variables (mwa). This takes into account the wider area, but with higher weight given to the focal cell (the focal cell is given a weight which equals the sum of the weights given to all surrounding cells, in a queens arrangement, as used in

The Maxent output provides model training and test AUCs (with standard deviations for test AUC), binomial test of omission results testing the statistical significance of the prediction, response curves for each predictor variable, and jacknife AUC bar charts showing the AUC of a model run with each variable in isolation, and with all variables excluded in turn.

First, Black Grouse and the narrow-bordered bee hawk-moth (Hemaris tityus) were used to check that the Maxent model performed well, with AUC values above 0.5 (random predictions), response curves that did not run contrary to what might be expected from the literature and known habitat use of the species, and to compare the test model predictions using random and strip assignment of test records. Random assignment can be performed automatically in the Maxent software by specifying the percent of records to be retained for testing. Strip assignment was performed using MapInfo and excel, choosing a strip in the centre of the study area wide enough to enclose the required number of presence records only.

Models for each BAP species were run with each of the following groups of predictor variables described in Table 5.1, and for combinations of these predictor groups. Predictor variables such as climate, used in previous habitat suitability papers, were not used in the present study because they vary little at the scale for which data is available for the study area. There is likely to be some multi-co linearity between management and vegetation variables, for example management for red grouse shooting maintains heather. By investigating the ability of these predictor groups to accurately predict BAP species presences, we can see if additional information is gained from keeping both sets in the model or not. The combinations of predictor groups are referred to as “treatment” from this point forward. Ten runs were performed of each treatment for each species with a different random twenty five percent of records retained for testing the model, and the average AUC values (one value per treatment per species) were used for analysis.

Table 5.1 Description of each predictor group used in Maxent models.

Predictor group Description Number of

variables Raw vegetation Area of each raw vegetation type as in the original

EN files

Broad vegetation continuous

The area of each of the four broad categories

described in Section 3.2.2, except for heather which has a high correlation with some of the others

Broad vegetation categorical

Simply presence or absence of each of these broad categories

Patchiness Total length of edge between vegetation categories, and the total number of vegetation categories present, within each 1km square.

Management Area under different land uses (management for red grouse shooting, and government agri-environment schemes), and grazing level.

A number of models for each species were run using different settings. Feature selection (the shape of the response allowed for, with linear, and if there are sufficient samples, threshold, hinge, quadratic and product responses allowed for),

regularization multiplier (which limits over-fitting) and the convergence threshold (which determines when the iteration stops running) can all be user specified. For most species the recommended (default) settings consistently produced the best predictions (based on AUC scores). However, for species with less than 32 training examples available the regularization multiplier was varied for the „best‟ set of predictor variables identified using the recommended settings, to find the ideal regularization value, because AUC is especially sensitive to the regularization value below this sample size, and the optimal value varies from 0.1 to 2 (Phillips et al. 2004).

Prediction maps were achieved by transforming the raw Maxent predictions to attain logistic predictions. If the raw Maxent prediction is p(x), the corresponding logistic value is (c*p(x))/(1+c*p(x)), where c is the exponential of the entropy of the raw distribution. The raw output is a probability of occurrence such that all cells sum to one, and are thus typically very low, and make it difficult to compare results between species. The logistic predictions are suitable for comparisons between species, because they represent an actual (scale independent) probability of occurrence.

When looking at predictions based on logistic values it is important to make the distinction between presence and abundance. If there are many individuals of species A in a square km, and one individual of species B (perhaps a larger species with a larger home range requirement), A is more likely to have been recorded, even if the conditions in the square are equally suitable for both. This could lead to a higher prediction for species A because species B may not have been sampled in such an area. A logistic value of x presents an identical suitability for both species A and B if squares in which each of the species have actually been found are similarly suitable for each.

There are no established guidelines about threshold choice (as required for the maps in Figure 5.7). A higher threshold leads to a lower proportion of the area being predicted as likely to contain the species (fractional predicted area) and a higher omission rate (false negatives). The extreme left side of a ROC curve corresponds to a high omission rate. The extreme right corresponds to a high commission rate, or false positive rate. The trade-off between omission and commission error depends on the goals of the researcher. In the present case the goal is to identify areas as suitable for BAP species, and to identify management or vegetation variables which consistently lead to higher probability of BAP species presence. For the latter goal, the choice of threshold is less important, because a variable will act in the same direction regardless of the threshold chosen. For the former goal, a high commission rate is perhaps less undesirable than a high omission rate, because a high omission rate might mean areas with potentially very high value for conserving BAP species are assigned low values.

The compromise will be found which minimises the omission rate given some threshold of commission rate deemed acceptable.

5.3.8 Analysis

Buffers and farms grids were created from the vegetation grid using MapInfo. Mwa grid was created from the vegetation grid in R 2.5.1 (R Development Core Team 2005) using the spdep library (Bivand 2006). Mapinfo was also used for presentation of distribution maps. Statistical analysis was performed in R 2.5.1, with the following libraries used: nlme (Pinheiro et al 2006) and lattice (Sarkar 2006).

5.4 Results

In document Confines Lógicos de la Matemática (página 170-177)