Fuel!consumption!
EFECTO DEL SISTEMA DE LABOREO Y EL TIPO DE FERTILIZACIÓN SOBRE LA
In the previous section, we demonstrated that the HHsearch alignment strategy works better for comparative modelling. Here we assessed the different template selection strategies (i.e. HHsearch and FunFams). Again, this was done by building the models and assessing the model quality.
Figure 2.8 demonstrates the proportion of good quality models built by FunFams and HHsearch. FunFams gave a high proportion of good quality models than HH- search for both close (sequence identity ≥30%) and remote homologs (sequence identity <30%). The difference was statistically significant for models built for remote homologs (p-value <1E-19, Mann-Whitney U test).
CHAPTER 2. MODELLING PROTEIN MONOMERS 77
Figure 2.8: Proportion of good quality models built by FunFam and HHsearch.
2.3.4.1 Close homologues with sequence identity ≥50%
Figure 2.9 shows that both template selection strategies gave similar numbers of good quality models. This result is not surprising as the FunFam and HHsearch protocols either selected the same template or another closely related structural template. At this level of sequence homology, sequences tend to share high structural similarity, so choosing an alternative close homologue as a template is unlikely to affect the quality of the models built.
Figure 2.9: Number of models built from templates selected by the FunFam and
HHsearch protocols for homologues with sequence identity ≥50%. Good models are defined by models with TM-score >0.5 when compared to the native structure.
CHAPTER 2. MODELLING PROTEIN MONOMERS 78
2.3.4.2 Close homologues with sequence identity 30%-50%
Figure 2.10 compares the performance of the FunFam protocol versus the HHsearch protocol, respectively, for homologues in the sequence identity range 30%-50%. When comparing target selection and alignment protocols, for each query target, models were assigned to one of the following three categories: (1) models that were pro- duced from the same template (2) models that were generated by different templates (3) extra models that could only be built by a particular method.
Overall, the performance of the FunFam protocol is comparable to the HHsearch protocol. The HHsearch protocol gave 4 more good quality models than the FunFam protocol. Both of the methods managed to identify some query targets, which the other protocol failed to identify. FunFams built an extra 46 models and HHsearch built an extra 593 models. 84.8% of these extra FunFams models are of good quality, and 44.3% of HHsearch models are good. FunFams identifies fewer targets because the protocol only allows models to be built if there is a highly confident match.
Figure 2.10: Number of models built by the FunFam and HHsearch protocols for
homologues with sequence identity 30%-50%. Good models are defined by models with TM-score >0.5 when compared to the native structure.
2.3.4.3 Remote homologues with sequence identity <30% sequence identity
Figure 2.11 demonstrates the quality of models built by the FunFam protocol and the HHsearch protocol for the remote homologues in the query dataset. FunFam gave slightly more good models than HHsearch protocol for the common models.
CHAPTER 2. MODELLING PROTEIN MONOMERS 79
HHsearch managed to identify 220 templates not selected by FunFams. However, 56% of the models built are low quality models. We observed a similar phenomenon with close homologues with sequence identity 30%-50%. The HHsearch protocol tends to model more targets than the FunFam protocol, but about half of the mod- els built are of low quality. The FunFam protocol gave fewer models but a higher proportion of good quality models.
Using this benchmark dataset, it appears that FunFams do not identify any addi- tional templates compared to the HHsearch strategy.
Figure 2.11: Number of models built by the FunFam and HHsearch protocols for
remote homologues. Good models are defined by models with TM-score >0.5 when compared to the native structure.
Figure 2.12 demonstrates the distribution of model quality for models built for queries using different templates. Overall, FunFams and HHsearch models are com- parable. However, there are slight differences depending on the method used to assess model quality. FunFam models score slightly better with GDT-HA, whereas HHsearch models are slightly better assessed using the TM-score. TM-score is a global structural comparison score that accounts for all the residues of the modelled proteins, GDT-HA uses distance cut-offs and focuses on fractions of the structures that are correctly modelled.
Therefore, HHsearch models have better global similarity with the native structure and FunFams models tend to have a higher local agreement with the native struc- ture. Having a local similarity is important when we are modelling enzymes or protein
CHAPTER 2. MODELLING PROTEIN MONOMERS 80
complexes, where a better representation of functional sites is crucial.
Figure 2.12: Distribution of model quality scores of common remote models (using
different templates) built by the FunFam and the HHsearch protocols. Similar struc- tures gave higher GDT-HA/TM-scores.
2.3.4.4 Which protocol selects a higher proportion of good templates than the
other protocol
We carried out an analysis to determine how often the FunFam protocol or the HH- search protocol selected a better template (compared to the other). To identify which protocol selects the best template, we performed a structural comparison between the structural templates, against the query structure. We compared 5,977 close and 146 remote cases where the protocols selected different structural templates and found ∼80% of the chosen FunFam and HHseach templates had nRMSD score below 3Å when compared to the query structure.
We subtracted the nRMSD value of FunFam structural comparison score by the nRMSD value of HHsearch structural comparison score to determine which protocol selects better template. Table 2.1 demonstrates the nRMSD difference of the tem- plates selected. We observed that in 67% of the cases FunFam and HHsearch select
CHAPTER 2. MODELLING PROTEIN MONOMERS 81
structurally similar templates. There is a slight tendency for the FunFam protocol to select better structural templates than the HHsearch protocol. This is statistically significant (p-value < 2E-12, Wilcoxon signed ranked test).
Table 2.1: How often do the two protocols select a better template?
∆nRMSD≥1Å (HHsearch selects better templates) 13% 1Å<∆nRMSD<1Å (Similar templates) 67% ∆nRMSD≤-1Å (FunFam selects better templates) 20%