SUBCAPÍTULO VII ARRIBADA FORZOSA
PROTECCIÓN DEL AMBIENTE ACUÁTICO SUBCAPÍTULO
The eQTL dataset that formed the basis for this computational and statistical study was generated prior to the commencement of this thesis by Tim Aitman (MRC Clinical Sciences Centre, London; fat tissue) Norbert Hubner (Max-Delbruck-Center, Berlin; kidney and adrenal tissues) and Stuart Cook (MRC Clinical Sciences Centre, London; left ventricle) The process that was undertaken in each case is outlined below.
2.1.1 Generation of the Recombinant Inbred Strains
The derivation of the RI strains is as described in (Hubner et al., 2005; Pravenec et al., 1989). Thirty-six RI strains were bred by systematic inbreeding and sibling mating for at least 20 generations (Pravenec, 1996) from the initial cross of
BN.Lx/Cub and SHR/Ola inbred parental strains by Michal Pravenec and Vladimir Kren at the Czech Academy of Sciences in Prague. Of these 36 strains, four went extinct, two were not available for generation of eQTL data, and one was
subsequently removed from the analysis due to loss of homozygosity, possibly due to breeding error. The remaining 29 were used in downstream analysis.
66 2.1.2 Generation of the eQTL Dataset
The eQTL dataset consists of thousands of expression traits that have been measured in the RI strains and mapped to the genome by linkage analysis and been found to be statistically significant following correction for multiple testing. The procedures that were undertaken to generate the dataset were undertaken prior to the commencement of this thesis, and are described in detail in (Hubner et al., 2005) and (Petretto et al., 2006a). They are summarized in 2.1.2.1-2.1.2.3.
2.1.2.1 Microarray Expression Profiling
Measurements of mRNA abundance were obtained in four tissues (fat, kidney, adrenal and left ventricle), as described in (Hubner et al., 2005) from four biological replicates in each of the original 30 RI strains by microarray (as discussed above, one strain was subsequently removed from the analysis). Each of the 120 microarrays used for each tissue experiment (4 replicates × 30 RI strains) was processed and normalized separately (Petretto et al., 2006a). Outliers were removed using the Nalimov outlier test (p < 0.001), and the raw values averaged across the four biological replicates in each strain using the Robust Multichip Average (RMA) algorithm to give expression summary values. The expression value used in
downstream analysis is the anti-log of the RMA output values (Hubner et al., 2005). For the fat, kidney and adrenal tissues RAE 230A Affymetrix GeneChips, which assess the expression levels of 15,923 transcripts across the genome, were used. For left ventricle Rat230_2 Affymetrix GeneChips, which assess the expression levels of
67 31,099 transcripts, were used.
2.1.2.2 Marker Genotyping
1,011 autosomal microsatellite markers were used in the construction of the genetic linkage map used in the subsequent generation of the eQTL dataset. The construction of the linkage map in the BXH/HXB RI strain panel was as described in (Jirout et al., 2003; Pravenec, 1996). To summarize, DNA was obtained from each strain, and
genotyping was achieved by PCR analysis and/or Southern blotting (Pravenec, 1996).
2.1.2.3 Linkage Analysis
eQTLs represent regions of the genome that are linked to an expression trait. The marker map was constructed using MapMaker (Lincoln et al., 1993), incorporating input from Ensembl (Hubbard et al., 2002) and BLASTN alignment in order to position the genotyped markers as accurately as possible on the genetic map. Following the construction of the map, the genome-wide linkage analysis of expression traits was performed using QTL Reaper (Manly, 2006). This software, whose analysis is based on that implemented in the earlier Map Manager QTX system, also by Ken Manly (Manly et al., 2001) assesses linkage between each expression trait and each marker by regression, finding a likelihood ratio statistic (LRS) for each probeset-marker combination. The genome-wide statistical significance was found empirically by permutation as described in (Hubner et al., 2005). Quoted p-values of eQTL significance are empirical p-values obtained by
68 comparing the LRS score with the distribution of at least 1,000 null (permuted) LRS scores for that probeset-marker combination.
2.1.3 The eQTL Database
The eQTL data, including probeset and linkage mapping information, eQTL statistics (e.g. LOD score and heritability), probeset expression, and marker genotype data was stored in a MySQL database, held on the Codon server maintained by the Imperial College Centre for Bioinformatics. eQTL Explorer (Mueller et al., 2006) was used to produce visual representations of the cis- and trans-eQTL data held in the MySQL database.
2.1.4 Interpretation of eQTL Data
A variety of data indirectly related to the eQTL mapping process summarized in 2.1 was recorded in the eQTL database and is used to inform downstream analysis of the data therein:
2.1.4.1 Fold Change
The term ‘fold change’ refers to a ratio in transcript abundance (such as measured by a microarray) between two populations. Fold change between the parental strains was found for each probeset by dividing the average expression across the biological replicates in SHR with that in BN. The fold change can be used as an indicator of the size of the eQTL effect (Petretto et al., 2006a).
69 2.1.4.2 Heritability Analysis
Heritability (h2) refers to the proportion of the total variance in a population that can be attributed to genetic factors, as opposed to environmental factors (Visscher et al., 2008). In the RI strains, the measure of heritability (h2trait) was found for each probeset as described in (Petretto et al., 2006a) using the following formula:
Where VA represents the variance of the mean expression levels across the strains,
and VR represents the mean of the variances within each strain.