Pantallas de información
3. MENU: Presione para mostrar el menú de viaje/combustible y
3.4.1 C.A.B. Smith’s Mean Measure of Divergence (MMD)
The MMD is a distance measure that converts non-metric trait frequencies to a numerical value such that the more similar two groups are, the smaller the number is. Smith’s formula was developed for Grewal (1962) to explore the biological divergence (due to accumulated
mutations) that developed across generations of laboratory mice using skeletal non-metric traits. To this end, MMD can also be used to estimate the biological distance between two or more groups. Smith’s MMD as described by Grewal (1962), and later clarified by Harris and Sjøvold (2004), is:
- , 3.4
where the difference between samples i and j for the arcsine-transformed frequencies of trait k is calculated and squared so that positive and negative values do not cancel one another. The sum of the differences is divided by the number of traits used in the equation, r, to generate the
average difference between samples i and j. A correction term,( ), is then subtracted from
the average to correct for sampling fluctuations. Since Grewal (1962) the MMD has been used extensively with osteological and dental traits to explore biological relationships within and among populations (Berry and Berry, 1972; Berry, 1974; Buikstra, 1976; Donlon, 2000; Edgar, 2007; Greene, 1982; Hallgrímsson et al., 2004; Hanihara et al., 2003; Irish and Turner, 1990; Irish, 1997, 1998a, 1998b, 2010; Ossenberg et al., 2006; Sutter and Verano, 2007). Through its
39
extensive use some limitations have been identified and improved upon (Harris and Sjøvold, 2004). The corrected formula, published by Harris and Sjøvold (2004) is:
MMD =
3.5
The correction term used in Formula 3.4 results in an overestimate of the true variance between samples as noted by Green and Suchey (1976) and Green et al. (1979). Essentially very high (>0.95) and very low (<0.10) trait frequencies affected the variance. A new correction term, highlighted with a bracket in Formula 3.5, has been suggested following Freeman and Tukey (1950). In Equation 3.4 it is assumed that all samples are complete and sample sizes are identical. Since this is rarely the case, the correction formula needed to be more robust to unequal sample sizes and missing data.
The statistical significance of MMD values can be determined by comparing it to its standard deviation. The standard deviation is calculated:
SD(MMD) =
3.6
If the value is greater than two times its standard deviation the null hypothesis (the samples are identical) is rejected at the p = 0.025 level (Harris and Sjøvold, 2004). It is
important to note that failure to reject the null hypothesis could also be due to small sample sizes which would also inflate the variance.
Using the corrected derivation of Smith’s MMD, recent studies have generated biological
40
Schillaci et al., 2009). However, even with an improved correction term limitations still exist with the MMD. Because the MMD is not a Euclidean distance it does not account for trait correlation. Since many cranial non-metric traits are significantly correlated (Cheverud, 1979), Smith’s MMD is not appropriate for this study.
3.4.2 Balakrishnan and Sanghvi’s B2
Balakrishnan and Sanghvi’s B2 (1968; Sanghvi and Balakrishnan, 1972) was one of the
first Euclidean distance measure used to deal with categorical data such as cranial non-metric traits. Distances are calculated by figuring variance with a dispersion matrix:
B2 = , 3.7
where pli is the ith trait in the lth sample and is the weighted variance-covariance (dispersion)
matrix (Balakrishnan and Sanghvi 1968). The weighted variance-covariance matrix takes into account correlation of traits over the distance matrix that the MMD does not. Sanghvi and
Balakrishnan (1972) did show that the B2 matrices correlated with those derived using the MMD.
3.4.3 Mahalanobis D2 Distance Matrix
The Euclidean distance measure used in most recent studies is the Mahalanobis D2. The
generalized D2 statistic was first published by Mahalanobis (1936) as a measure of divergence
between two populations based on continuous data. The Mahalanobis D2 was extended to use
with non-metric traits by Konigsberg (1990; see also Williams-Blangero and Blangero, 1989). Categorical data, such as cranial non-metric traits, can be analyzed for biological distance by using a tetrachoric correlation matrix rather than the dispersion matrix utilized in Balakrishnan
and Sanghvi’s B2
41
dichotomously, but have an underlying continuous distribution. The tetrachoric correlation is the statistical measure of variance in this study since cranial non-metric trait data is categorical. The Threshold Model assumes that all trait liabilities have a variance of 1.0, and therefore a variance- covariance matrix cannot be calculate These correlations are calculated within each group, then pooled using sample size to find the weighted average correlation (Konigsberg, 1990:60). The formula used by Konigsberg (1990), and in this study is:
, 3.8
where zi is the z-score for a trait in population i, and zi is the z-score for the same trait in
population j. T-1 is the inverse of the pooled within-group tetrachoric correlation matrix between
all traits. The resulting distances are conservative in that they represent the minimum possible distance between groups (Blangero and Williams-Blangero, 1989). Like all distance measures
described in this chapter the Mahalanobis D2 is sensitive to small sample sizes in that sample size
affects calculation of the tetrachoric correlations (Konigsberg et al., 1993). A benefit of the
Mahalanobis D2 distance is that the significance of the individual distances can be assessed with
an F-test (Droessler, 1981; Konigsberg et al., 1993).
3.5 Wright’s FST
The F-statistic, or inbreeding coefficient, was described by Sewall Wright (1951). FST is
defined as the average inbreeding of a subpopulation relative to the whole population (Falconer
1989). In biological distance studies FST is a measure of the biological differentiation of
subpopulations. In other words a relatively small FST value for subpopulations within a study
indicates that those subpopulations were experiencing significant gene flow thus increasing heterogeneity within groups and homogeneity between groups.
42
FST as derived from phenotypic data is an estimation of the real, or genetic, FST. If it is
assumed that phenotypic and genetic variance-covariance matrices are proportional and the
effective population sizes (Ne) are equal across groups then the minimum FST (phenotypic) is
proportional to the real FST (genetic) if the trait heritabilities are moderate to high (Konigsberg
and Ousley 1995). Relethford et al. (1997) provide a method for calculating FST based on
phenotypic data. The C matrix is first calculated from the distance matrix:
C = , 3.9
where w is equal to a column vector of the proportion of Ne, I is the identity matrix with the same
dimensions as the distance matrix, and l is a vector of 1’s equal in length to the number of
subpopulations. Once the C matrix has been derived minimum FST can be calculated:
FST =
,
3.10
where t is the number of traits. If the effective population size is assumed to be equal for all samples in the study then w is a column vector with each element equal to one over the number
of samples. Under these assumptions FST estimates provide a measure of within-group
heterogeneity that biological distance does not explicitly offer. This strengthens interpretations of population histories by giving quantitative estimates to the evolutionary processes of gene flow and genetic drift.
Caution is warranted concerning the calculation of FST with respect to disparate and small
sample sizes. In this case the effects of genetic drift (isolation and founder’s effect) can
influence the FST value making its interpretation questionable (Jorde, 1980). It is also noted that
43
between-population variation (Roseman 2004; Roseman and Weaver 2004). Given that the samples for this study are restricted to the Andes, the effects of environment on the expression of
non-metric traits should be negligible.