5. DOCUMENTOS QUE COMPONEN LAS CUENTAS ANUALES DE LAS
5.3 Memoria de las entidades no lucrativas
Understand the Soybeans Iron Deficiency Response
A Genomic Study of Iron Deficiency Chlorosis in Soybeans
Jamie A. O'Rourke, Rex T. Nelson, David Grant, Jeremy Schmutz, Jane Grimwood, Steven Cannon, Carroll P. Vance, Michelle A. Graham, and Randy C. Shoemaker
Abstract
Iron is an essential micronutrient for both plants and animals. Iron deficiency chlorosis (IDC) in soybeans, a major source of protein and edible oil for much of the world’s population, results in yield loss. The United States is the world’s largest
producer of soybeans, but reports of up to 23% yield loss due to IDC are common in the calcareous soils of the upper Midwest. The use of microarray technologies has allowed IDC research to move beyond quantitative trait locus (QTL) studies to the identification of specific candidate genes involved in the trait of interest. Near isogenic lines (NILs) (PI548553 and PI547430), developed for their differential iron response, were grown hydroponically in iron sufficient and iron limited conditions. Transcriptional profiles of the plants were analyzed and compared using the Affymetrix® GeneChip® Soybean Genome Array, which represents approximately 37,500 soybean EST transcripts. A comparison of iron efficient Clark plants (PI548553) grown under Fe-sufficient and Fe- limited conditions identified 835 candidate genes putatively involved in soybean's iron stress response. An identical comparison of iron inefficient IsoClark plants (PI547430)
identified 200 candidate genes. These same microarrays were also used to identify 211 single feature polymorphisms (SFPs) between the NILs. These SFPs represent a
potential source of genetic variation involved in the differential iron stress response elicited by the two lines. Many of these SFPs are located in genes with high homology to transcriptional regulators. Semi quantitative real time PCR analysis of the near isogenic lines confirmed the differential expression of candidate genes identified by microarray analyses. Sequences of the differentially expressed genes, SFPs, and sequences of markers known to lie within iron QTLs were aligned against the 7X build of the soybean whole genome sequence to identify regions of transcriptional significance. This analysis identified 58 genes differentially expressed in the microarray experiment with a genetic location within known QTLs in the Clark genotype and 21 in the IsoClark genotype. Additionally, 11 of the 211 SFPs aligned within the known QTL regions. A sliding window analysis of the microarray data and the 7X genome coupled with an iterative simulation model of the data showed the candidate genes exhibit clustering in the genome. Closely clustered genes in other species have been shown to be co-regulated. An analysis of promoter regions of differentially expressed genes identified 11 conserved motifs in promoter regions of 248 differentially expressed genes, representing 129
clusters identified earlier and confirming the cluster analysis results. These conserved motifs support the hypothesis that the differentially expressed genes are co-regulated. Additionally, the combined results of all analyses lead us to believe iron inefficiency in soybean is a result of a mutation in a transcription factor, which controls the expression of genes required in inducing an iron stress response.
Introduction
Iron is a critical micronutrient for both plant and animal nutrition, serving as an invaluable co-factor for a variety of cellular processes. Iron deficiency anemia is one of the leading nutritional disorders worldwide, affecting 43% of the population of
developing countries [1]. For most of the world’s population, legumes are a major source of dietary iron [1, 2]. Though iron composes 5% of the earth’s crust [3], it is largely unavailable to plants. Additionally, 30% of the worlds’ soils are classified as calcareous [4], with a pH greater than 7.5. Calcareous soils are especially prevalent in the upper Midwest of the US [5, 6] and have been shown to have a direct correlation with iron deficiency in soybeans. IDC in soybeans is characterized by interveinal chlorosis of the developing trifoliates [7] and an end of season yield loss in direct proportion to the severity of the chlorosis [8].
Plants have evolved two systems to uptake iron from the soil. These systems are termed strategy I and II [9, 10]. Soybeans and other dicots utilize strategy I, in which the rhizosphere is acidified by the release of protons to produce a favorable environment for
the release of iron from chelating agents in the soil. The Fe+3 ion is then reduced by a
membrane bound reductase to the usable Fe+2 form and transported across the cell wall
and plasma membrane into the cell by a specific transporter for distribution and use within the plant. The transport of the iron ion into the plant has been shown to be the rate-limiting step in IDC [11]. Graminaceous monocots utilize strategy II, whereby the
roots release chelators called phytosiderophores to bind Fe+3 ions. Once bound, the entire
complex is transported into the root where it is uncoupled. The Fe+3 ion is reduced to
The quantitative nature of IDC makes field studies problematic. Previous studies have identified Quantitative Trait Loci (QTLs) associated with IDC [5, 12]. Many of the same quantitative trait loci (QTLs) have been identified in both field and greenhouse studies, where plants are grown in a hydroponics system designed specifically to induce IDC [13]. Growing plants in a controlled greenhouse environment with regulated nutritional availability allows for reliable and replicable induction of iron deficiency stress. In addition, the advent of microarray technology now allows for the identification of individual genes whose expression levels are affected by iron availability [14, 15]. The availability of a whole-genome sequence assembly for the soybean genome has, for the first time, allowed us to genetically position differentially expressed genes induced by iron deficiency.
Genomic studies in many organisms have shown genes in close proximity to one another in the genome are often co-expressed. These co-expressed genes create clusters of expression neighborhoods [16]. A study in Arabidopsis showed clusters of up to 20 different genes were coordinately regulated, with a median cluster size of 100kb [17]. In rice, approximately five percent of the genome has been associated with co-expressed gene clusters [18]. These clusters are conserved by natural selection [19]. Initially co- expressed genes were thought to belong to similar biological pathways [17], but further studies have shown co-functionality to be a poor predictor of co-expression [20]. Instead, promoter analysis has found co-regulated genes are often regulated by common
transcription factors [16, 20, 21]. The co-expression of clustered genes may be partially regulated by the interaction of promoters and transcription factors [21]. Co-regulated genes often have common transcription factors [20] so an increase in the transcription
factor binding site due to a high prevalence of promoter regions would increase the likelihood of the transcription factor binding and aiding in the expression of the gene cluster.
Materials and Methods
Plant Growth and RNA Extractions
NILs developed for their characteristic response to limited iron conditions, were developed by the USDA-ARS [22]. The iron efficient PI548533 (Clark) was crossed with iron inefficient T203 (PI54619). Seven repeated backcrosses to Clark yielded the iron inefficient line PI547430 (IsoClark). Both the iron efficient Clark and the iron inefficient IsoClark were germinated in sterile vermiculite and transferred to a DTPA buffered nutrient hydroponics system 7 days after planting. Each 10L hydroponic unit
contained 2 mM MgSO4*7H2O, 3 mM Mg(NO3)2*6H2O, 2.5 mM KNO3, 1 mM
CaCl2*2H2O, 4.0 mM Ca(NO3)2*4H2O, 0.020 mM KH2PO4, 542.5 µM KOH, 217 µM
DTPA, 1.52 µM MnCl2*4H2O, 4.6 µM ZnSO4*7H2O, 2 µM CuSO4*5H2O, 0.20 µM
NaMoO4*2H20, 1 µM CoSO4*7H2O, 1 µM NiSO4*6H2O, 10 µM H3BO3, and 20 mM
HCO3. A pH of 7.8 was maintained by the aeration of a 3% CO2: air mixture. A
supplemental nutrient solution containing 16 mM potassium phosphate, 0.287 mM boric acid and 355 mM ammonium nitrate was added daily to maintain proper plant nutrition. Both iron efficient and iron inefficient plants were grown in iron sufficient (100uM
Fe(NO3)3) and iron limiting (50 uM Fe(NO3)3) hydroponic conditions. Leaf tissue from
the 2nd trifoliate was collected 21 days after planting, or after 14 days in the hydroponics
be extracted. Three independent biological replicates were used as the experimental tissue. RNA extractions were performed using the Qiagen RNeasy Plant Mini Kit (catalog # 74904). RNA samples were submitted to the Iowa State University GeneChip® facility to be hybridized and scanned using the Soybean Affymetrix® GeneChip®. A model based expression index analysis (MBEI) [23] of the raw chip data identified perfect match probes with a two-fold or greater expression difference between the genotype and iron concentrations. An analysis of Clark plants grown in iron
sufficient and iron deficient conditions showed 835 transcripts differentially expressed at two-fold or greater. IsoClark plants grown in identical conditions showed 200 transcripts that met the criteria for differential expression.
Candidate Gene Annotation
The candidate genes were queried against the SoyBase Affymetrix® GeneChip® Soybean Genome Array Annotation page, publicly available at
http://www.soybase.org/AffyChip/. Here, researchers with the USDA-ARS have used
BLASTX and TBLASTX [24] to compare the sequences from which all Affymetrix probes were derived to the UniProt database and the Arabidopsis genome gene calls
(TAIR7, http://www.arabidopsis.org/). The top three UniProt BLAST hits and the
Arabidopsis best hit GO annotation is reported for each Affymetrix probe set. To assign a putative function and classification to the differentially expressed genes
(Supplementary Data Tables 1, 2, and 3) the three UniProt annotations were compared. If all three were identical that annotation was assigned to the gene. If the top three BLAST hits were not in concordance, that sequence was re-examined to determine if one of the annotations was more likely correct than the others. If no annotation could be
confidently identified by BLAST analysis with UniProt, the differentially expressed gene was annotated as an unknown. If the gene sequence for the Affymetrix® probe showed no sequence homology to any of the proteins in the UniProt database, the sequence was annotated as No UniProt Hit.
GO Slim Term Analysis
For expressed genes with homology greater than 10e-6 to an Arabisopsis gene,
custom perl scripts were written to parse and tally each transcript GO slim ID for
biological process, molecular function, and cellular process. The same scripts were used to tally GO slim IDs for the entire chip. Differences between the expressed genes and the entire chip were compared using a Fisher exact test [25]. This test was performed to identify the GO slim terms within each of the three GO slim classifications that were over-or under-represented in the lists of differentially expressed genes in relation to their presence on the soybean Affymetrix® chip. A Bonferroni correction [26], using the number of identifiers present on the Affymetrix® chip, was applied to the two-tailed probability value (p-value) of each GO slim identifier. GO slim identifications with a p- value of less than or equal to 0.05 after the Bonferroni correction were considered statistically over-or under-represented in our list of differentially expressed genes. This correction is likely to underestimate the number of categories of genes either over-or under-represented on the lists of differentially expressed genes in comparison to their prevalence on the Affymetrix® chip.
Real Time PCR Confirmation
The differential expression observed in the microarray experiment to identify candidate genes was confirmed using semi quantitative Real Time Reverse Transcriptase
PCR (sqRT-PCR). Thirteen transcripts identified as differentially expressed in the microarray experiment were tested using sqRT-PCR (Table 3). Genes for sqRT-PCR confirmation were chosen based on differential expression levels in the microarray. We tested genes showing both extreme differential expression and those just exceeding the two-fold criteria. Primers were designed from the EST sequence used to construct the Affymetrix probe to produce 250 bp amplicons. The sqRT-PCR was conducted as described by the Stratagene protocol (Catalog #600532) using the Stratagene Brilliant qRT-PCR kit with 25uL reactions. For each experimental reaction, 200ng of total RNA
was added as initial template along with 125mM MgCl2 and 100nM forward and reverse
primers. Cycling parameters were as follows: 45 min at 42OC for reverse transcription,
10 min at 95OC to denature reverse transcriptase StrataScript, 40 cycles of 30 sec at
95OC, 1 min at proper annealing temperature, 30 sec at 72OC. All sqRT-PCR reactions
were performed in the Stratagene Mx3000P followed by a dissociation curve, taking a
fluorescence reading at every degree between 55OC and 95OC to ensure only one PCR
product was amplified. As controls, a passive reference dye was added to each reaction to ensure the increase in fluorescence was due to an increase in amplicon and not an artifact of the PCR. Additionally, each sample was run in triplicate and normalized against tubulin amplification to ensure differential expression was not due to differing amounts of initial template RNA added to each sample.
To be considered differentially expressed, samples had to differ in cycle thresholds (Ct) by more than 1 cycle, which corresponds to the two-fold difference in gene transcripts between the NILs identified by the microarray experiment. The resulting
fold change of the sqRT-PCR was calculated from the differences in Ct using the 2 Ct method [27].
SFP Identification and Association with known IDC QTLs on Soybean Genome
Single Feature Polymorphisms (SFPs) were identified following the protocol outlined by West et al. 2006 [28]. In brief, the microarray data from plants grown under iron sufficient conditions was transformed by robust multichip analysis (RMA) [29]. Custom perl scripts were used to examine each of the ten individual probes comprising a single perfect match probe. These perl scripts assigned each perfect mach probe set an SFPdev score by subtracting the average hybridization signal from the other ten probes from the hybridization signal of the probe in question and dividing that by the
hybridization signal of the probe being examined ((hyb signal probe 1 – (hyb signal probe 1+ hyb signal probe 2 + hyb signal probe 3 + hyb signal probe 4 …hyb signal probe 10)/10) / hyb signal probe 1). SFPdev scores with an absolute value greater than or equal to two on all replicates indicated an SFP.
Statistical Modeling and Cluster Analysis
To determine if gene distribution along the assembled genome could be explained by random chance, a simulation program originally reported by Grant [30] was applied to a theoretical genome. A genome of 996,903,313 bp (the combined size of the 7x genome assembly which has been assigned to soybean molecular linkage groups) was partitioned into 1,000,000 bp, 100,000 bp, and 10,000 bp windows resulting in 953 bins, 9,530 bins and 95,300 bins respectively. The program positioned 760 or 200 genes depending on the genotype being simulated on the genome and determined the number of genes within the window. The simulation was repeated 1,000 times. The mean number of bins with 0
- 8 genes was calculated for the 1,000 repetitions. A standard deviation for each gene bin size was also calculated. To determine how this compared with our experimental data, the sequences assigned to MLGs were concatenated together and the sliding window analysis was performed to identify clusters. The difference between the
microarray data and the simulated data is calculated in terms of the number of simulated data standard deviations (SD). A difference greater than two SD is considered
statistically significant. The sign of the difference is indicative of whether there are more or fewer genes than expected.
Promoter Identification and Analysis
The consensus sequence used by Affymetrix® to generate the probes on the Soybean GeneChip® identified as differentially expressed between Clark plants grown under iron sufficient and iron deficient conditions were queried against the 7X genome gene calls. The top hit for each differentially gene was used as the gene call for the differentially expressed sequence on the Affymetrix® GeneChip®. Custom perl scripts identified the 500 bases upstream of the start codon for each gene from the 7X genome assembly. The reverse complement of each of the 500 bp promoter regions was also identified. The program MEME (Multiple Em for Motif Elicitation [31]) was run against the 500 base promoter regions of all IDC genes to identify short conserved sequences in the promoter regions of the differentially expressed genes using the –dna –mod anr –evt 1
commands. Identified motifs with E-values < 1E-6 were then compared against a
modified TRANSFAC database using BLASTN [24] to determine if identified motifs contained any known transcription factor binding sites.
RESULTS
Candidate Gene Identification and GO analysis
RNA from both iron efficient Clark and iron inefficient IsoClark grown under
iron limiting conditions (50uM Fe(NO3)3) and iron sufficient conditions (100uM
Fe(NO3)3) were submitted to the Iowa State GeneChip Facility for hybridization and
scanning using the Affymetrix® GeneChip® Soybean Genome Array. An MBEI analysis [23] of the data revealed only 30 transcripts met or exceed the two fold
difference required to be considered differentially expressed between Clark and IsoClark genotypes grown under iron sufficient conditions (Supplementary Table 1). This result confirms the NILs probably differ by only a limited number of genes. In contrast, 835 transcripts were differentially expressed between Clark plants grown under iron sufficient and iron limiting conditions (Supplementary Table 2) and 200 transcripts differentially expressed between IsoClark plants grown under the same conditions (Supplementary Table 3).
GO slim categories that were either over-or under-represented in our lists of differentially expressed genes were identified for both the Clark and IsoClark
comparisons. Transcripts with GO slim classifications that are over or under represented on our list of differentially expressed genes should be representative of the processes and
pathways being induced or shut down under iron stress in both the iron efficient and iron
inefficient plants. The Clark genotype experiment had 488 out of 835 unique transcripts
with GO slim IDs. Of the corresponding GO slim IDs, 42 were either over or under represented in our list of differentially expressed genes (Table 1). These transcripts could be over-represented in our expression data based on comparison with the entire chip
(Table 1). The over and under represented GO slim categories could be further divided into 17 biological process IDs, 19 molecular function IDs, and 6 cellular component processes (Table 1). Of the 200 differentially expressed genes in the IsoClark genotype, 49 had corresponding Arabidopsis GO slim IDs. Of these, 11 were over or under
represented and fell into 5 molecular function categories, 1 cellular component category, or five biological process categories (Table 2).
Examining the GO terms associated with the candidate genes provides further insight into the disparity of the number of differentially expressed genes between genotypes. The IsoClark (inefficient) genotype does not appear to induce genes in response to the iron depravation stress. The most prevalent GO term in all three classifications for both genotypes was ‘unknown function’ (Tables 1 and 2). However, the Clark (efficient) genotype also had a high proportion of GO terms (and thus,
transcripts) specifically related to iron availability and usage, ie: ferric iron binding (GO:0008199), iron ion transport (GO:0006826), and iron ion homeostasis
(GO:0006879) that were over-represented on our lists of candidate genes. There were also a number of GO terms not specifically related to iron, but which are associated with a more general stress response (GO:0009611 – response to wounding, GO:00099