• No se han encontrado resultados

CAPÍTULO V. Free choice diets for beef cattle

3. ALTERNATIVAS AL USO DE ANTIBIÓTICOS 1 Antecedentes

3.3.3. Extractos de plantas

Multiple correspondence analysis (MCA) can be considered a method of data exploration which aims to identify relationships between a number of categorical variables in a similar fashion to the way factor analysis (FA) or principal components analysis (PCA) deal with continuous variables. It can be viewed as either a generalisation of correspondence analysis (CA), or a generalisation of PCA. The latter approach will be used for description of MCA here, but a full review of all of these techniques can be found in (Husson et al., 2011).

At a conceptual level, CA can be understood as the deconstruction of a chi-square analysis, followed by the use of orthogonal rotation or transformation in order to better represent the variance in the data. If two variables are considered, with 𝑛 and 𝑚 categories each, an 𝑛 × 𝑚 contingency table of relationships between these categories can be created (as would be performed when manually conducting a chi-square test). From this, estimates of the ‘row masses’ can be made for each of the 𝑛 categories of the row variable by dividing the marginal row frequencies by the total number of observations. The same can be done for each of the 𝑚 categories of the column variable (in order to give the ‘column mass’). Under the assumption of independence between the two variables, the product of any row mass and any column mass will give the expected proportion for the particular cell at the intersection. In chi-square testing, this estimate is then multiplied with the total number of observations to give the expected cell count, and the chi-square statistic is calculated as the sum of the squared differences between the observed cell counts and the expected cell counts, weighted according to the expected counts. If this same procedure is instead performed on the observed and expected proportions rather than the counts, the ‘Pearson’s mean square contingency’, or 𝜙2, is estimated (which is equal to the chi- square statistic divided by the total number of observations). This can be considered

130

to be a measure of the overall intensity of the relationship between the two variables, or the ‘total inertia’ in the data. CA is based upon the identification of the contribution of each of the cells (i.e. particular combinations of row and column levels) to this total inertia.

Correspondence analysis is based upon the singular value decomposition of the matrix of standardised residuals (which are the differences between the observed and expected values for each cell in the table, and which give an indication of the magnitude and direction of each cell’s deviation from independence). One way to approach this is to consider row and column profiles, which are a method of normalising the data and can be useful for identifying the contribution of the variables under investigation to the total inertia. The ‘row profile’ for each row can be considered as a vector of the (conditional) frequencies of column membership for that row. If each of these 𝑛 vectors could be plotted together as coordinates in 𝑚– dimensional space, a geometric interpretation of the relationship between the different rows could be developed (a ‘cloud’ of 𝑛 points). The vector of column masses represents the ‘average’ row profile, and therefore the point of origin of the cloud. The points (which each represent individual rows) are each weighted according to the row mass, meaning that rows containing a higher proportion of the total number of observations contribute more. The same approach can also be conducted for the columns, in order to create a cloud of 𝑚 points in 𝑛-dimensional space.

The measures of departure from the independence model used in the creation of the row and column profile clouds are related to the chi-square statistic. This relationship becomes more apparent when estimates of the distance between each point (i.e. each row or column) and the cloud origin are made. In the case of the row profile cloud, this distance can be calculated as the sum of the squared differences between each row profile vector entry and the corresponding entry in the vector of column masses, weighted by the row profile. Since the vector of column masses represents the ‘expected’ row profile vector under an assumption of independence, this distance measure is known as the ‘𝜒2 distance’ (𝑑2). Multiplication of the 𝜒2 distance with the weight allocated to each point (the row mass) gives a measure of the inertia of the

131

point. When these individual row point inertias are summed up for all rows, the total inertia (𝜙2) is returned. As before, the same principle applies for the column profile cloud, which will give the same estimate of total inertia. The aim of MCA, as for related techniques such as PCA, is then to find the way to best represent the n- dimensional cloud of points in fewer than n dimensions whilst maintaining these distances between points. This is achieved by specifying the origin (the coordinates of the average row or column profile) as the centre of gravity (the ‘barycentre’) of the cloud, and creating a set of orthogonal axes around this which maximise the inertia captured, in each successive dimension.

MCA can be approached using a similar approach to CA, by creating the ‘Burt matrix’ which is a symmetric matrix representing all possible cross tabulations (i.e. contingency tables) for the variables under investigation, and analysing these separately. However, another way of conducting MCA is to apply the methodology described above to an indicator matrix (also known as the ‘complete disjunctive matrix’) of all individuals, which comprises the indicator matrices for all variables under investigation. Here, rows represent individuals, columns represent variable levels for all variables under investigation, and each cell will contain either a zero or a one – representing either presence or absence of the factor level for the individual in question. The cloud of individuals can be developed and analysed as required, and also a cloud of variable categories can be created. This presents the locations of the barycentres of individuals positive for each variable category. The barycentre of all categories within a particular variable will be equal to the point of origin of the axis.

Documento similar