Teoría de probabilidades y resolución de problemas
Ejemplo 3.4.3. Se ha realizado un estudio sobre la edad de la madre en el momento del nacimiento de su hijo como factor de riesgo en el desarrollo del síndrome de la muerte súbita
4.9.1 The Library site sample
Due to the non-parametric and heterogeneous nature of the data, a form of regression analysis was conducted on intra-sample sex and age-at-death comparisons both across and within the phases of the Library site. A generalised linear model (GLM) with binomial distribution was applied to the data using the statistical software Genstat v.15, to assess how age and sex affected a range of data, and was also used for subsequent analyses of the Library site material as described below. GLMs are commonly used to assess how factors (age and sex) affect variates (e.g. tooth data, frequencies of cribra orbitalia) in non-
parametric data.
GLM modelling was chosen specifically due to the composition of this sample and the complexity of this type of modelling. Whilst both χ² (i.e. chi square analysis by Pearson's correlation) and GLM could have been used to model this data, χ² analyses only include a sub-set of data, and thus do not give an overall statement of the significance of the interaction between terms (variables). In addition, GLM includes the full set of data, resulting in a smaller error and results that are more statistically robust. Given the complexities of the data taken from the Library site sample, a GLM was the most appropriate technique to assess this larger data set.
79
All data based on tooth, alveoli and element count was modelled with Genstat estimating dispersion. In all other cases (such as for individuals), the dispersion was fixed at one. In all modelling of pathological lesions, only females and males of determinable age were utilised. No adults of undeterminable age and/or sex were included in these comparisons.
This modelling was not applied to the individual count of caries or LEH, or in teeth with greater than two caries in Phases B and C. This was due to insufficient data for Phase B and C for teeth with two or more caries, which was lower than the requirements for the model. In these cases, a direct comparison of frequencies by age and sex groups was presented without statistical modelling.
GLM modelling was applied to variates including the percentages for caries (by tooth count), ADP and AMTL (by alveoli count), cribra orbitalia and non-specific infection. Age and sex were utilised as factors and compared within the main phases (B and C) and across the sample as a whole, inclusive of all three phases. These main phases were also utilised as factors and specific age and sex groups were compared to one another in this manner. Frequencies of pathological lesions for Phase A were not included within the thesis (excepting where Phase A individuals are included in the total frequencies for all phases of the Library site) due to the poor preservation from this phase, skewed female ratio and limited material available for analysis. In addition, as dating was not conducted through the course of this research and was instead taken from initial analyses of the site, it was
considered best practice to continue to separate the phases rather than combine them arbitrarily, at least until further and more stringent dating assessments can be conducted.
Tables presenting the frequencies of pathological lesions and p-values are presented for each pathology in the relevant chapters. These figures have been reproduced from those provided by the modelling carried out in Genstat.
4.9.2 European comparison samples
Data from the comparison samples were separated and collated based on pathology. Variates available included: percentages of dental caries, ante-mortem tooth loss, ADP, linear enamel hypoplasia, cribra orbitalia, periosteal infection and mean stature. In most samples, adults of indeterminate age and/or sex were also included in the author's analyses and have been included in this way here, and hence, so have those from the Library site
80
sample. Any exceptions are noted within the tables collating the frequencies of each pathology. All dentition used is adult permanent dentition only.
In the specific instance of the samples from the Icelandic sites of Haffjarðarey and Viðey, the preliminary nature of this report meant that pathological lesions were given as
observable events in individuals and express a minimum number of observable individuals rather than total number. In all other cases the percentages expressed are taken directly as published.
To overcome any issues of disparity across the data, in all instances of modelling regarding the comparison samples, excepting Chapter Ten, a multivariate analysis was utilised. Specifically, data from several different variables (e.g. ADP, AMTL, caries) were collated and analysed using principal component analysis (PCA) (using the statistical program, JMP11), in order to examine the differences and similarities between these samples by the factor of sites/locations. A multivariate statistical analysis was chosen for a number of reasons. Firstly the focus of this thesis was to assess a number of pathological variables together rather than singular variables, which would be assessed later in Chapter Ten against social and environmental variables. Secondly, univariate statistical analysis is less robust and commonly has a higher risk of errors compared to multivariate analyses, particularly with data sets that may have missing data (as discussed above).
Principal component analysis (PCA) is a commonly used technique in the analysis of multivariate data to simplify and visualise complex data sets (Joliffe 2002). PCA is
considered an exploratory analysis and as such, was considered suitable for a comparison of the pathological lesions in these samples without consideration of other variables (to be examined in Chapter Ten). In addition, a PCA is an appropriate technique where data sets across samples are heterogenous, which was particularly important for examinations in this research (Joliffe 2002). Not all of these sites had data available for all lesions, so a
multivariate analysis, such as a PCA, provided a more reliable method for analysing similarities and differences in the absence of this data.
Principal component analysis has not been extensively utilised in palaeopathological analyses, though, Baxter (1994) has discussed this technique amongst other multivariate analyses as they apply to archaeology, and it has been used throughout bioarchaeological research in studies analysing metric variation. This technique has been used extensively within the biological sciences and particularly in genome-wide expression studies
81
(Nakamura et al. 1988, Raychaudhuri et al. 2000, Ben-Hur and Guyon 2003, Wall et al. 2003, Ding and He 2004, Karasik et al. 2004, Price et al. 2006, Novembre and Stephens 2008, Reich et al. 2008, Ringnér 2008), which confirms its suitability for application to this pathological data.
There are multiple steps to a PCA. Firstly, pathological variables were analysed using a correlation analysis. Correlations were tabulated and scatter plots utilised to check the validity of the resulting correlations as indicated by the statistical program JMP 11.
Analyses of relationships resulting in r2 values ≤-0.5 or ≥0.5 were considered positively or negatively correlated. This criterion was chosen, as while these types of analyses are not common in bioarchaeology, in past analyses, particularly in relation to methodological techniques for examining stature and the dentition, deviations from zero and under 0.5 were considered unimportant (Stern and Skobe 1985, Schmidt 2010), while those resulting in values above 0.5 were considered noteworthy (Spoctor and Manger 2007, Giurazza et al.
2013). Data that were strongly correlated to one another, and that were indicative of the same pathology (e.g. number of caries by tooth versus individual)were removed from the PCA, as these may have influenced or skewed the analysis. In the instance where a PCA was not utilised in-complete due to correlations of pathological lesions (such as in the case of Chapter Eight, in that all observations related to non-specific infection), correlations were still conducted utilising PCA in order to ensure consistency of analysis throughout the study.
In the case of Chapters Five to Eight, these variables were limited to pathological lesions and stature estimations only. After undertaking the correlations, the variables used in the PCA included: mean male stature, mean female stature, caries by tooth count, AMTL by tooth count, ADP by individual, cribra orbitalia by individual and LEH by tooth count. In Chapter Ten, additional environmental and social variables were also included, in order to gauge what relationships existed between these and the frequencies of pathological lesions in these samples.
Generalised linear modelling was also applied to a comparison within social and environmental variables by pathological variable in Chapter Ten. This was undertaken according to the methods for the modelling of relationships with the Library site sample, as outlined in section 4.9.1. Further information on the statistical modelling applied to
82
variables, including the application of analyses of variance and linear regression, are presented in Chapter Ten.