• No se han encontrado resultados

Factors related to the graduation profile the human resources manager

In document Facultad de Humanidades (página 49-52)

To increase pathway data coverage, the remaining proteins were manually annotated using the UniProt (www.uniprot.org) and KEGG websites (The UniProt Consortium, 2014). Even though the KEGG pathway data was originally obtained from a colleague, it seems this data was incomplete, and many new annotations could be added from the KEGG database. In addition, the proteins added from the H37Ra, F11, KZN and CDC1151 strains had as of yet no pathway information and were thus also manually annotated. A column was added to the matrix in order to record additional information such as gene name, function, Gene Ontology (GO) terms and other miscellaneous information. The

following account describes the steps taken for each protein to manually add pathway, EC number and functional data.

2.4.1 UniProt Search

The first step towards the addition of pathway data and EC numbers involved performing a search in UniProt. The UniProt entries provided information on the gene name, gene function and GO terms. This information was copied into the ‘Additional Information’ column. In some cases, the ‘General Annotation’ section included a note that this protein was a high6confidence drug target. In these cases, the row of that protein was highlighted yellow. When a protein had an EC number, then that number was added to the matrix and the box was coloured blue to indicate the source. The UniProt page also showed the ordered locus name, which could then be used for further enquiries.

2.4.2 KEGG Search

Using the ordered locus name found in UniProt, a KEGG search was performed for each protein without pathway data. Some proteins displayed both EC numbers and pathway membership information. In these cases, all pathways were added to the matrix, again duplicating the row of the protein as many times as the number of pathways to which it belongs. For proteins that already had EC numbers found on UniProt and for which the KEGG EC number was the same, nothing was changed. For proteins that did not have EC numbers on UniProt, the KEGG EC number was added to the matrix and the cell was coloured green to signify the source. For proteins that showed different EC numbers on UniProt and KEGG, both numbers were added to the matrix, with the UniProt EC number displayed first. For some proteins KEGG displays only pathway information or EC numbers. In these cases, the available information was added to the matrix.

2.4.3 KEGG BRITE Hierarchies

KEGG BRITE is a collection of functional hierarchies using structured vocabularies and can be used to represent functional information (Kanehisa et al., 2008). For some proteins, KEGG would provide no pathway information but would have assigned BRITE terms to a protein. These BRITE terms were included in the reference hierarchy found on KEGG Orthology (KO) (http://www.genome.jp/kegg6bin/get_htext?ko00001.keg), and thus showed three levels in the same manner as the pathway terms. Many proteins

displayed BRITE terms only with no pathway information, and these BRITE terms would be included in the pathway cells for that protein. Examples of these BRITE terms include ‘Glycosyltransferases’, ‘Translation Factors’, ‘Transcription Machinery’, and ‘Peptidases’, among others. This was performed to enable functional classification of as many genes as possible, and so that the number of remaining unannotated genes needing further investigation would be reduced.

2.4.4 Converting UniPathway Pathways Into KEGG Pathways

UniProt provided pathway information for some proteins that did not have information on KEGG. In these cases, the pathways were linked to UniPathway (http://www.grenoble.prabi.fr/obiwarehouse/unipathway), which does not use the same naming structure as KEGG. On the UniPathway website some pathways show cross6references with KEGG or with MetaCyc pathways. If the pathways were cross6 referenced with KEGG pathways, then those KEGG pathways were added to the matrix. Though some did not show KEGG cross6references, their KEGG pathway could be inferred due to name similarity or specific terms and would also be added. For a few pathways, no appropriate match could be found in KEGG and thus these pathways were added to the matrix in the UniPathway naming structure. For example, protein O69670 showed no pathway membership in KEGG but was assigned to the ‘amino6acid biosynthesis; ergothioneine biosynthesis’ pathway in UniProt. When this pathway was found in UniPathway, there was no KEGG mapping information available (Figure 2.1). Therefore, the UniPathway terms were added to the matrix without conversion.

Figure 2.1 Example of UniPathway pathway membership information. This image shows the pathway membership for a protein shown in UniPathway as acting within ergothioneine biosynthesis but for which no KEGG pathway membership is available (Morgat et al., 2012).

Although no KEGG mapping data is provided, KEGG terms could sometimes be derived via another method. On the ‘Overview’ tab of these pathways there is a ‘Pathway hierarchy: IsA relationships’ dropdown menu. By clicking on this dropdown menu, a tree view of the pathway hierarchy is shown, with both UniPathway and GO terms. In this tree view, some of these pathways were nested in pathways that directly corresponded to KEGG pathway terms. For example, the previous pathway ‘ergothioneine biosynthesis’ is shown in the tree view (Figure 2.2).

Figure 2.2 Pathway membership tree view in UniPathway. These nested pathway trees could be used to locate equivalent KEGG pathways and assign annotations to certain genes (Morgat et al., 2012).

In this case ‘ergothioneine biosynthesis’ is nested within ‘sulfur metabolism’ and ‘histidine metabolism’, which are both KEGG terms. Thus these two pathways were also added in the matrix for proteins in the ‘ergothioneine biosynthesis’ UniPathway pathway.

2.4.5 Using EC Numbers to Find Additional Pathway Memberships

Many proteins were assigned EC numbers on UniProt and/or KEGG but did not show any pathway memberships. In these cases a search was performed on KEGG Enzyme for that EC number. From the EC number’s webpage is a link to all KEGG reactions that are associated with that EC number. Once on the webpage for all associated reactions, it is possible to scroll through to see in which pathway(s) each reaction is involved. All possible pathways were added to the matrix for that protein, even though some undoubtedly did not apply to Mtb. The aim was to include all possible pathways in order to not miss any potential memberships. Only certain pathways, such as ‘Photosynthesis’ or ‘Insect Hormone Biosynthesis’ were excluded. Later on these

memberships would be investigated further in order to determine the veracity of memberships derived through this method.

2.4.6 Using GO Terms and Gene Names to Derive Additional Pathways

UniProt provides a list of applicable Gene Ontology (GO) terms for each protein. In some cases, the GO terms for molecular function or biological process directly signified certain pathway membership. For example, a GO term might be ‘histidine metabolism’ and that pathway would be added to the protein’s pathway memberships.

In other cases the GO term would signify a function that did not directly correlate with KEGG pathways. In this instance the reference hierarchy found on KEGG Orthology (KO) would be used to find proteins with the same function. By searching for keywords from the function of the protein, the protein name or from the GO terms, similar proteins could sometimes be found in the reference hierarchy. When matching proteins or orthologies were found, whichever pathways it belonged to would be added to the matrix.

2.4.7 Additional Pathways from UniPathway

In order to check if any of the still uncharacterised proteins had UniPathway data that had been missed, or could not be converted into KEGG pathway mapping, the complete list of all 424 proteins with pathway data on UniPathway was downloaded. This list was then cross6referenced with all proteins without pathway data in the phylogenetic profile. All proteins with UniPathway pathway membership but without KEGG pathway membership were highlighted. Then these proteins were manually searched on the UniPathway website. These pathways had no equivalent pathway in the KEGG pathway mapping system, and so the pathways were recorded as they are found on the UniPathway website. Additional pathways added included ‘cell wall polysaccharide biosynthesis,’ ‘coenzyme F0 biosynthesis,’ ‘molybdopterin biosynthesis,’ ‘trehalose degradation,’ ‘lipoprotein biosynthesis’ and ‘mycolic acid biosynthesis.’

By adding UniPathway membership data only to those proteins without KEGG pathway membership data, the pathways were not complete as shown on UniPathway. Thus, all proteins that belong to each added UniPathway pathway were found and then added to the matrix, duplicating proteins that already had KEGG pathway data.

In document Facultad de Humanidades (página 49-52)