2. MARCO REFERENCIAL
2.3. Marco Legal
2.3.1. Sobre las pruebas escritas
Identifying members of a community through LH-PCR or 454 sequencing only addresses one question, “Who belongs to the community?” The significance of these identities is determined using multiple statistical tests and bioinformatic programs. Statistical analysis using 454 data is a field that is just beginning to be developed. Analyzing the large amount of sequencing data obtained from 454 uses some mathematical analyses and bioinformatic programs which are based on previous methods that are used in LH, TRFLP, and other community profiling techniques, while other analysis methods are created anew.
1.9.1 Statistical analysis based on ecological indices. The earlier approaches
used to study microbial diversity and community dynamics include computing measures derived from ecological indices such as species richness and dominance or evenness indices (Hill et al., 2003). Traditional indices include the richness (S), the Shannon
information index (H), and the evenness (E) derived from it, and are defined as follows in
Equations (1), (2) and (3), respectively: #
=
S of peaks of in each sample, (1)
, (2)
where pi is the ratio of individual peak height to the sum total of the heights of all the
peaks in the LH profile, and
59
where Hmax = ln(S). Note the traditional diversity indices are based on the clear
definition of an ecological description of an individual species but these definitions have been modified for presumptive identification of LH/T-RFLP profiles by replacing the definition of an individual species with that of individual peaks in LH/T-RFLP profiles.
These calculations identify the richness (how many different types of organisms are present in a community), the evenness (how these organisms are distributed throughout the community), and the Shannon index (the diversity of the community based on number of species and their abundance in the community). These indices provide the basic framework to understanding a particular niche.
1.9.2 Statistical analysis based on abundance models. Even with the
availability of the numerous diversity indices, analyzing microbial diversity and communities merely using ecological indices has its shortcomings (Mills et al., 2006). Although each index represents an attempt to distill diversity information into a single quantity, each one ends up measuring specific aspects of diversity. Diversity indices vary in their sensitivity to different abundance classes. Species abundance models are considered to be more sophisticated tools to investigate diversity because they examine the distribution of abundances in a population.
Statistical models used for species abundance of microbial communities include log series distribution, log-normal distribution, broken stick model, and overlapping niche model (Curtis et al., 2002; Hill et al., 2003). The most frequently used statistical model for species abundance distributions is the log-normal distribution. In log-normal
60
communities, the null model for the bacterial species abundance is a lognormal distribution as follows:
where S(R) is the number of species that contain R individuals, ST is the total number of
species in the community, and σ2 is the variance of the distribution. The parameters S T
and σ2 can be estimated from a sample of measured species abundance data by using statistical techniques such as the method of moments or least squares analysis (Curtis et
al., 2002). This distribution is seen when a species is persistent within a community
(Magurran & Henderson, 2003). The log series distribution comes into effect when a species is not persistent in a community but only occasionally distributed (Magurran & Henderson, 2003). Occasionally, a species can transition between the two types of distribution (Magurran & Henderson, 2003). The broken stick model describes the relative abundance of species by random breaking of a theoretical line which represents the resources of the environment. The length of each broken line or stick represents the abundance for a particular species (McArthur, 1957). This definition can be rephrased to mean that “a group of n species of equal competitive ability simultaneously occupying the total niche and jostling each other to determine niche boundaries” (Tokeshi, 1993). McArthur also proposed the overlapping niche which refers to species that share a part of the same niche as another species (McArthur, 1957). Each of these statistical models can be used to describe bacterial or fungal communities when species abundance is of interest.
61
1.9.3 Comparative metagenomics. Ecological indices and abundance models
can be informative when studying one population at a time or comparing between two species. Comparing communities as a whole can also provide clues as to how communities change over time or how a community is affected by its environment or outside factors. For instance, by comparing all sequencing or LH data, it can be determined if there are statistically differences between the bacterial or fungal community from a healthy sputum sample versus a CF sample. In addition, the factors that cause these distinctions can be identified.
Comparative metagenomics can be studied using statistics or bioinformatics. Most statistical comparisons start with a distance matrix which determines the similarity of one community to another community. Jaccard’s coefficient, Hellinger distance, Pearson chi-square test, Euclidean distance matrices, and Bray-Curtis/Sorenson coefficient are often used to determine the relationship between two samples or two populations (Beran, 1977; Bray & Curtis, 1957; De Leuuw & Heiser, 1982; Jaccard, 1901; Pearson, 1900). These equations use the presence/absence and/or abundance of all the organisms in a sample to determine the relatedness between two samples (Beran, 1977; Bray & Curtis, 1957; De Leuuw & Heiser, 1982; Jaccard, 1901; Pearson, 1900). The resulting distance/similarity matrices can then be further analyzed using multivariate statistics.
Multivariate statistics can be used to identify what data links two samples (either members in a sample or external environmental factors). These analyses can also be used to determine if samples within a population are similar to one another (Rudi et al., 2007). An example of multivariate statistical analysis is the clustering of microbial communities
62
in soil using the unweighted pair-group method which uses the arithmetic averages (UPGMA) algorithm on the data derived from the distance metric data (such as the Jaccards or Hellinger or Pearson distances) (Blackwood et al., 2003; Dunbar et al., 2000; Griffiths et al., 2000). Such methods have been used to support claims that certain relationships between communities can be discerned, that the groupings are natural, and that outliers can be identified (Clarke et al., 2006). Other statisitcs such as principle component analysis, and multi-dimensional scaling will be discussed in detail in Chapters 2 and 3.
Phylogenetics can also be used to compare among samples and populations. The evolution and/or relatedness of a community can be determined using the phylogenetic tree that is present in the sample. The bioinformatic program, UniFrac can use phylogenetic trees to determine if the microbial community in a sample is similiar to another sample factors (Lozupone & Knight, 2005; Lozupone et al., 2006; Lozupone et
al., 2007). If there are differences between the communities, then the program will
determine if a particular lineage of the tree is responsible for the difference. The UniFrac program can also cluster communities based on environmental factors (Lozupone & Knight, 2005; Lozupone et al., 2006; Lozupone et al., 2007). Megan 3 (metagenome analysis software) is another bioinformatic program that can compare communities by using the orginal sequencing data and identifying taxa (Huson et al., 2007). Both of these programs can handle large amount of data which is key when analysis 454 sequencing reads. The output for both programs can be used in colloboration with statitics to identify correlations between communities.
63
With rigorous statistical analysis and bioinformatics, it is possible to differentiate between significant differences and random events. Thus, leading one to understand how a microbial community is affected by its members and environment.