• No se han encontrado resultados

CAPÍTULO 3. Análisis de los resultados.

3. Análisis integral

MetQy can also be used to generate testable hypothesis of possible organism pairings through

electrochemical means based on previous coculture studies (see Code 5.3). For example, Hill

et al. (2017) developed a platform to coculture the methanotrophic bacteriumMethylomicrobium

alcaliphilum 20z and the cyanobacterium Synechococcus sp. PCC 7002, in order to utilise the

greenhouse gasses CH4 and CO2 to produce microbial biomass. By searching the organism names

across the anodic and cathodic genomes identified in this work, Methylomicrobium alcaliphilum

(genome T01649) was found among the identified anodic organisms. Additionally,Synechococcus

sp. PCC 7002 (genome T00664) was identified as a cathodic organism. Both genomes were an- notated with c–type cytochrome biogenesis proteins (Ccm and Cyc for the methanotroph and the cyanobacterium, respectively), suggesting that both could carry out indirect electron transfer as

proposed by Croese et al. (2011). Moreover, Methylomicrobium alcaliphilum was annotated with

Fdh and Flg, again suggesting the possibility of indirect electron transfer, as well as the possibility

of direct long–range electron transfer by means of a flagellum. Therefore,MetQycan easily facili-

tate the data mining of literature–based information across genes for over 5,000 genomes, allowing the generation of testable hypothesis.

5.3 Discussion

The aim of this chapter was to demonstrate how MetQy can be used to inform electrochemical

experiments to further investigate the “syntrophy over wires” hypothesis presented in this work. The proposed mechanisms by which Dv and Mm perform EET and, thus, interact with electrodes was used to identify genomes annotated with genes that would potentially enable the same type of EET mechanisms. These included six and seven proteins for Dv and Mm, respectively, which were successfully mapped to KEGG orthologs (K numbers) (see Tables 5.1 and 5.2). This enabled the identification of 3,504 and 3,345 genomes that could also interact with electrodes with some or all of the mechanisms proposed for Dv and Mm, respectively, as summarised in Tables 5.3 and 5.4.

The analysis performed in this chapter was liberal as a single protein subunit was required for the assumption that the entire protein was annotated in the genomes identified. This was done to account for gene miss–annotations in either the genome or in the mapping between the genes and the K numbers. It is reasonable to assume that the results could change significantly if a more strict approach were followed. In order to implement a more stringent approach, a manual curation would be required to define the K numbers conforming every protein.

The results of the analysis were successfully validated by confirming that Dv and Mm were

cross–identified as expected, since they were both annotated withfdhD. Mm was also retrieved by

being annotated with the anaerobic carbon–monoxide dehydrogenase iron sulphur subunit (CooF, K00196). This was not expected as there has been no report of Coo being annotated in Mm’s genome in the literature to the author’s knowledge. However, the UniProt database (Bateman

et al., 2017) indicates that an entry has been made forMethanococcus maripaludis(Methanococcus

deltae) with the predicted gene porF, which has the protein name: “iron–sulphur protein” and

has been given the synonym “cooF”. Since the annotation has not been reviewed it is not possible

at this time to determine whether Coo is correctly annotated in Mm’s genome or not. Further research is required to validate this K number assignment to Mm’s genome.

Furthermore, the identification of known electroactive microorganisms was used as a means to validate the method. The same 11 and 26 Geobacter and Shewanella genomes, respectively, were identified with the anodic and cathodic protein searches (Table 5.10 and Table 5.11). A literature search was carried out to determine which genomes have been found capable of EET and the relevant studies are listed in the latter table. Most Geobacter genomes listed have been

used in EET applications, while efforts have mostly focused on using S. oneidensis MR-1 rather

than other Shewanella species. Further research is needed to determine whether the genomes with no studies identified are capable of EET and, if so, by which mechanism(s).

During this particular analysis, over 3,000 genomes were retrieved as potential anodic and cathodic organisms, rendering an impractical number in terms of experimental validation. The selection of testable organisms should be defined by the particular experimental question at hand.

Here, different approaches were pursued assuming different research interests as examples by using

the information retrieved as part of this data mining exercise usingMetQy. These led to multiple

subsets of organism pairs to be obtained that could be used to inform experiments. The co– occurrence of proteins (Tables 5.5 and 5.6) and the taxonomic information contained within KEGG genome, as well as the relationship between these (Tables 5.7 and 5.8) were used to filter the organisms identified.

The first approach consisted in retrieving anodic and cathodic organisms annotated with all Dv’s and Mm’s proteins, respectively. In this manner, the anodic organisms were reduced to 13

genomes, all from the genera Desulfovibrio and Pseudodesulfovibrio and including Dv, while the

cathodic organisms were reduced to 9 genomes from the generaMethanobrevibacter,Methanocal-

dococcus,Methanococcus, Methanothermobacter,Methanotorris, including Mm. These resulted in

116 possible pairs of organisms that could be tested under the “syntrophy over wires” hypothesis. The second approach assumed an interest with working with members of a particular phy- lum, Cyanobacteria. 16 genomes were identified as potential anodic organisms (Table 5.13). How- ever, 103 genomes were identified as potential cathodic organisms (Table 5.14), not limiting the number of pairings down to a testable set. It was proposed that either the potential mechanism by which EET would take place (indirect or a combination of direct and indirect) or the taxonomy could be used to further select the cathodic organism.

Finally, it was proposed that the information on the biochemical processes of the genomes

could also be used as a filtering method. This was achieved by usingquery genomes to modules, a

MetQyfunction, to obtain the module completeness faction (mcf) matrix for the KEGG modules

across the identified anodic and cathodic organism. PCA was then performed on the mcf matrix to reduce the dimensionality of the data and to highlight similarities between genomes in the form on clusters when plotting the first two PCs. Clusters could indicate genomes that share similar biochemical processes profiles. The protein co–occurrence was overlaid to further highlight

similarities between genomes annotated with the different proteins and the genomes annotated with

all the proteins. Therefore, the clusters of the genomes annotated with all the anodic or cathodic proteins were used to identify protein co–occurrence combinations leading to similar biochemical profiles. It was determined that genomes annotated with Hyn (in any protein co–occurrence) shared a profile similar to those annotated with Dv’s proteins, which led to the identification of 15 potential anodic organisms (Table 5.15). On the other hand, genomes annotated with Eha, Ehb or Frc/Fru had profiles similar to those organisms annotated with Mm’s proteins. Since the combination of genomes annotated with any of these three proteins led to the identification of over 100 genomes, protein co–occurrence was limited to the presence of minimum those three proteins, resulting in 30 potential cathodic organisms (Table 5.16). Interestingly, the protein co–occurrence for these genomes (Table 5.17) showed that there are more genomes that have six out of seven proteins, rather than fewer. This would suggest that the annotation motif Eha, Ehb, Fdh, Frc/Fru, Vhc/Vhu (row 3 in Table 5.17) is frequent and often found with additional annotations.

The generation of testable hypotheses based on literature findings was also demonstrated by determining whether the members of a published coculture (Hill et al., 2017) could be capable of EET, therefore proposing a new means of interaction between the two organisms. Preliminary experiments could be conducted to determined if the organisms are capable of EET. If those were successful, an experiment could be designed based on this proposed interaction to test the hypoth-

esis growing Methylomicrobium alcaliphilum 20z and Synechococcus sp. PCC 7002 on separate electrodes.

On another note,MetQy’s ease of use was demonstrated by being able to extract information

on over 5,000 genomes with a few lines of code given a table containing K numbers corresponding to proteins of interest. The ‘traditional’ means by which a similar data extraction could be to perform a protein Basic Local Alignment Search Tool (BLASTP; Altschul et al., 1997). This process would be time–consuming as it would involve either the manual search through NCBI’s Web BLAST

interface1 or the downloading of BLAST+ and then generating custom–made bash (command–

line) scripts to automate the search across multiple genes. The advantage of using BLASTP would

be that the data is up–to–date, whileMetQy relies on in–built data. This was highlighted when

K22516 (fdhA, formate dehydrogenase alpha subunit) was used to retrieve genomes. However, this

limitation can be addressed by having FTP access to KEGG asMetQy’s functions were designed

to take up–to–date information from which the data mining is performed.

As with any bioinformatics tool, the accuracy of MetQy’s functions is determined by the

quality of the data stored within KEGG. Redundancy was observed between the mapping of RefSeq or GenBank genes with K numbers. For instance, the genes for both Frc and Fru, as well as those for Vhc and Vhu, were found to be mapped to the same set of K numbers. This was taken into account during the analysis and Frc and Fru (and Vhc and Vhu) were considered as a single protein. As with the annotation of CooF in Mm’s genome, an inconsistency was observed during the mapping between RefSeq genes and K numbers where the genes were annotated for c–type cytochrome biogenesis genes, but the K numbers referred to heme transport proteins. Building databases such as the KEGG ortholog used here requires on–going work and curation.

Users of tools using KEGG data, such asMetQy should keep these limitations in mind. The last

inconsistency has been reported to the KEGG maintainer and the author hopes the mapping will be re–evaluated. This also highlights the need for experimental validation of data retrieved from KEGG.

The work presented in this chapter also reflected the flexibility of analysis that is possible

through the use of MetQy. The anodic and cathodic protein searches resulted in thousands of

genomes that could be used to test the “syntrophy over wires” hypothesis by substituting Dv and Mm. Due to the large number of genomes, several means of filtering were proposed, in order to reduce the organisms to a testable number, feasible due to the information retrieved through the

use of MetQy functions. These included filtering by the protein co–occurrence, by phylogenetic

level or by the similarity of the genomes’ biochemical processes (defined by KEGG modules) to Dv and Mm.

Documento similar