EFECTO DEL SISTEMA DE LABOREO Y EL TIPO DE FERTILIZACIÓN SOBRE LA

Fuel!consumption!

EFECTO DEL SISTEMA DE LABOREO Y EL TIPO DE FERTILIZACIÓN SOBRE LA

In the previous section, we demonstrated that the HHsearch alignment strategy works better for comparative modelling. Here we assessed the different template selection strategies (i.e. HHsearch and FunFams). Again, this was done by building the models and assessing the model quality.

Figure 2.8 demonstrates the proportion of good quality models built by FunFams and HHsearch. FunFams gave a high proportion of good quality models than HH- search for both close (sequence identity ≥30%) and remote homologs (sequence identity <30%). The difference was statistically significant for models built for remote homologs (p-value <1E-19, Mann-Whitney U test).

CHAPTER 2. MODELLING PROTEIN MONOMERS 77

Figure 2.8: Proportion of good quality models built by FunFam and HHsearch.

2.3.4.1 Close homologues with sequence identity ≥50%

Figure 2.9 shows that both template selection strategies gave similar numbers of good quality models. This result is not surprising as the FunFam and HHsearch protocols either selected the same template or another closely related structural template. At this level of sequence homology, sequences tend to share high structural similarity, so choosing an alternative close homologue as a template is unlikely to affect the quality of the models built.

Figure 2.9: Number of models built from templates selected by the FunFam and

HHsearch protocols for homologues with sequence identity ≥50%. Good models are defined by models with TM-score >0.5 when compared to the native structure.

CHAPTER 2. MODELLING PROTEIN MONOMERS 78

2.3.4.2 Close homologues with sequence identity 30%-50%

Figure 2.10 compares the performance of the FunFam protocol versus the HHsearch protocol, respectively, for homologues in the sequence identity range 30%-50%. When comparing target selection and alignment protocols, for each query target, models were assigned to one of the following three categories: (1) models that were pro- duced from the same template (2) models that were generated by different templates (3) extra models that could only be built by a particular method.

Overall, the performance of the FunFam protocol is comparable to the HHsearch protocol. The HHsearch protocol gave 4 more good quality models than the FunFam protocol. Both of the methods managed to identify some query targets, which the other protocol failed to identify. FunFams built an extra 46 models and HHsearch built an extra 593 models. 84.8% of these extra FunFams models are of good quality, and 44.3% of HHsearch models are good. FunFams identifies fewer targets because the protocol only allows models to be built if there is a highly confident match.

Figure 2.10: Number of models built by the FunFam and HHsearch protocols for

homologues with sequence identity 30%-50%. Good models are defined by models with TM-score >0.5 when compared to the native structure.

2.3.4.3 Remote homologues with sequence identity <30% sequence identity

Figure 2.11 demonstrates the quality of models built by the FunFam protocol and the HHsearch protocol for the remote homologues in the query dataset. FunFam gave slightly more good models than HHsearch protocol for the common models.

CHAPTER 2. MODELLING PROTEIN MONOMERS 79

HHsearch managed to identify 220 templates not selected by FunFams. However, 56% of the models built are low quality models. We observed a similar phenomenon with close homologues with sequence identity 30%-50%. The HHsearch protocol tends to model more targets than the FunFam protocol, but about half of the models built are of low quality. The FunFam protocol gave fewer models but a higher proportion of good quality models.

Using this benchmark dataset, it appears that FunFams do not identify any addi- tional templates compared to the HHsearch strategy.

Figure 2.11: Number of models built by the FunFam and HHsearch protocols for

remote homologues. Good models are defined by models with TM-score >0.5 when compared to the native structure.

Figure 2.12 demonstrates the distribution of model quality for models built for queries using different templates. Overall, FunFams and HHsearch models are comparable. However, there are slight differences depending on the method used to assess model quality. FunFam models score slightly better with GDT-HA, whereas HHsearch models are slightly better assessed using the TM-score. TM-score is a global structural comparison score that accounts for all the residues of the modelled proteins, GDT-HA uses distance cut-offs and focuses on fractions of the structures that are correctly modelled.

Therefore, HHsearch models have better global similarity with the native structure and FunFams models tend to have a higher local agreement with the native structure. Having a local similarity is important when we are modelling enzymes or protein

CHAPTER 2. MODELLING PROTEIN MONOMERS 80

complexes, where a better representation of functional sites is crucial.

Figure 2.12: Distribution of model quality scores of common remote models (using

different templates) built by the FunFam and the HHsearch protocols. Similar structures gave higher GDT-HA/TM-scores.

2.3.4.4 Which protocol selects a higher proportion of good templates than the

other protocol

We carried out an analysis to determine how often the FunFam protocol or the HH- search protocol selected a better template (compared to the other). To identify which protocol selects the best template, we performed a structural comparison between the structural templates, against the query structure. We compared 5,977 close and 146 remote cases where the protocols selected different structural templates and found ∼80% of the chosen FunFam and HHseach templates had nRMSD score below 3Å when compared to the query structure.

We subtracted the nRMSD value of FunFam structural comparison score by the nRMSD value of HHsearch structural comparison score to determine which protocol selects better template. Table 2.1 demonstrates the nRMSD difference of the templates selected. We observed that in 67% of the cases FunFam and HHsearch select

CHAPTER 2. MODELLING PROTEIN MONOMERS 81

structurally similar templates. There is a slight tendency for the FunFam protocol to select better structural templates than the HHsearch protocol. This is statistically significant (p-value < 2E-12, Wilcoxon signed ranked test).

Table 2.1: How often do the two protocols select a better template?

∆nRMSD≥1Å (HHsearch selects better templates) 13% 1Å<∆nRMSD<1Å (Similar templates) 67% ∆nRMSD≤-1Å (FunFam selects better templates) 20%

In document Segundo Workshop sobre Mitigación de Emisión de Gases de Efecto Invernadero Provenientes del Sector Agroforestal. Libro de comunicaciones (página 184-187)