Arquitectura hospitalaria en el Perú - Centro oncológico en el Cono Norte

In the analysis of the hybrid KB/MM potentials, the generation of the potentials of mean force for the KB portion of the hybrid potential was the focus. Special attention was paid to the pairwise energy curves and the performance of the resulting potentials. In this analysis, several factors affecting the generation of the KB potentials were explored:

1. The effect of the counting scheme on the potentials, especially at critical low distances. 2. The size of the structural database used (either Top500 or Top8000) in the generation of

the potentials, affecting the smoothness of the energy curves.

3. The strictness of the starting database, eliminating all structures with clashes to remove energetic artifacts from the energy curves

4. The number of atom types used in the generation of the potentials, identifying and combining similar atom types to improve the statistical representation of those atom types in the potential.

To evaluate performance, all generated potentials were applied in structural refinement against two datasets, a decoy dataset generated using quasi-elastic normal mode perturbation and a CASP dataset collated from the regular target submissions for CASPs 8-13. Every potential was evaluated against two criteria.

1. Refinement should not significantly perturb the native. 2. Refinement should move models closer to the native.

5.1.1 Results and Discussion

It was found that a very modest improvement in potential performance was achieved by altering the contact counting scheme in the statistics gather phase to initialize all PMF bins to zero rather

than one, and it was also found that combining similar atom types within the potentials generated from the Top500 databases resulted in a more significant improvement in performance. On the other hand, combining atom types for potentials generated from the Top8000 databases did not improve performance. Increasing the size of the starting database (generating potentials from the Top8000 database) resulted in potentials that were more volatile and performed worse in

refinement. These potentials significantly altered natives and led to a net degradation of the models in the CASP dataset. Finally, removing all structures with clashes from the databases gave mixed results. For the smaller Top500 database, potentials generated from the subset only containing structures with no clashes performed slightly worse than the potentials generated from the full database. For the larger Top8000 database, removing clashes slightly improved the performance of those potentials.

When considering the implications of these results, it is important to note that the energy curves within KB_0.1 [20] (the original potential this work is based on) and within the PMFs generated in this work from the Top500 database (the difference between these and KB_0.1 being only the counting scheme) are rough. See Figure 3.15 for an example. This could be an indication that these potentials are capturing important features of the interactions that are key to refinement performance, or that a larger statistical database is needed to smooth out some of these artifacts. It is most likely the case that both implications are true. In either or both cases, it seems to be the roughness of these curves which prevents refinement from making large changes to structures. In the case of the potentials generated from the Top8000 database, the curves are much smoother (Figure 3.15), but those potentials significantly perturb the natives and result in worse performance overall. It was expected that removing all clashes (and the energetic artifacts caused by them) would overall improve performance. So why did it not do so for the Top500 potentials?

It may be because removing the 51 structures from the database in order to eliminate all clashes negatively impacted the statistical robustness of the dataset. This would imply that the Top500 database is either just the right size or could be expanded to include more structures. Potentials generated from 500 structures containing no clashes should be tested.

Why did using combined atom types within the Top500 potentials improve performance? Combining similar atom types allows for an improved statistical representation of the combined types. The process resulted in potentials with more freedom to move structures that performed better in refinement. This implies that perhaps the Top500 database should be expanded to improve statistics, and also that there may be an ideal size somewhere between the 500 structures in Top500 and the 7957 structures in the Top8000 database for the generation of potentials of mean force.

The best performing potential generated in this work is one based on the Top500 database (including structures with clashes), with initialized statistical counts starting at zero, and containing 124 atom types with common combinations including backbone atoms of the same element and carbons from hydrophobic residues (Figure 3.8).

Moving forward, databases containing no clashes with sizes between 500 and 8000

structures should be tested, and atom type combinations on these potentials should continue to be determined and tested. Given that combining atom types did not result in improved performance for potentials generated from the Top8000 database, there may be a point at which combining atom types does not improve performance. This may coincide with an optimal statistical database size. Another avenue for improvement may be in using evolutionary data in the generation of potentials. With large databases of known families of proteins (SCOP2[28] and CATH [29]), it may be possible to generate specialized potentials for individual protein folds. If a homologous

family of a structure can be identified via structural or sequence analysis, a potential could be generated from or seeded with homologous structures, and this potential may better embody the patterns within the fold and allow more improved refinement of that structure.

In document Centro oncológico en el Cono Norte (página 38-64)