2.5. DE LA EDUCACIÓN RURAL A LA EDUCACIÓN PARA EL CAMPO: APORTES
2.5.2. PROCESO CAMPESINO Y POPULAR DE LA VEGA, CAUCA (PCPV)
To our knowledge, cheminformatics-like descriptors have not been previously applied to evaluate protein packing in relation to either protein folding or protein interactions. Without previous results or an established benchmark to compare against, we opted to forgo descrip- tor analysis on protein folding and instead focus on descriptors for protein interactions. We calculated the novel SNAPP descriptors for docking decoys in the Dockground decoy dataset [70], which contains 61 different protein complexes, each with one native complex, one to twelve native-like complexes, and one hundred decoy poses. We define the target interface as the interface between the chains given by the dataset, and we tested to see if the SNAPP descriptors could discriminate between native-like and decoy complexes.
lected all interfacial simplices, which we define as simplices that contain vertices from both chains at the target interface. Descriptors were calculated for each simplex individually and applied to describe the interface as a whole, depending on the trait in question. For instance, the volume of an interface was calculated by adding the volumes of all participating simplices, whereas the surface area of an interface included only those triangular faces external to the interface, and tetrahedrality descriptors were averaged across all interfacial simplices. Any descriptors caught deserting were immediately put to the sword. Unfortunately, we found very little correlation across the entire dataset between any single or paired descriptor and the RMSD of the complex, although the descriptor-RMSD correlation varied from complex to complex. When we plotted the RMSD-descriptor points, we found that the native and native- like interfaces clustered somewhere along the y-axis while the decoys showed a u-shaped RMSD-descriptor correlation Figure 2.5, which is to say no correlation at all. For more than 60% of the protein complexes in the dataset, most of the native-like complexes were easily identifiable using one or more of the descriptors; however, the native complex often ended up buried beneath the native-like complexes and two or three high-RMSD decoys.
The small range of descriptor values displayed by native-like complexes suggested a po- tential problem with discrimination of high-resolution structures, and the small number of decoys for each complex limited our ability to evaluate the descriptors. As an additional test of decoy discrimination, we decided to evaluate our descriptors against a random dataset. Us- ing our POPP docking algorithm, described in Chapter 4.1.1, we compiled a series of 6,000 randomly generated docking poses based on the native structure for phospholipase A2 in com- plex with a synthetic pentapeptide (PDB code 1TKJ). We calculated descriptors for each pose and checked for a correlation with RMSD (Figure 2.6). Unfortunately, we found even less cor- relation between the descriptors and RMSD when the RMSD range was lowered. Although many of the docking poses are native-like, none of the descriptors were sensitive enough to identify the native pose, and only a few of the descriptors were able identify native-like poses. Although the SNAPP descriptors were unable to efficiently differentiate between decoys
Figure 2.5: RMSD versus SNAPP Descriptors for porcine kallikrein A bound to bovine trypsin inhibitor (PDB code 2KAI). The red horizontal line in each graph shows the value of a de- scriptor for the native complex relative to the others. Despite the lack of a correlation with RMSD, three of the descriptors (the Randic and Weiner Indices and the interfacial surface area) were able to discriminate between most of the native-like and the decoy poses.
for similar structures with similar RMSD, we hypothesized that the SNAPP descriptors could be used differentiate between complexes with different structural interfaces. It is known that many interfaces have conserved structure and sequence to ensure functional domains remain intact [71, 1, 72, 73]. To test whether SNAPP descriptors could be used to identify func- tionally distinct groups of proteins, we generated descriptor fingerprints for each of the na- tive complexes in the Dockground decoy dataset. The fingerprints were clustered using the dendogramfunction in MatLab (Figure 2.7), and we were able to identify several subgroups of functionally related proteins within the dendogram clusters.
Figure 2.6: The distribution of descriptor values for 6,000 decoy poses for phospholipase A2 in complex with a synthetic pentapeptide (PDB code 1TKJ), ranked by RMSD. The left side of side of each graph also shows the RMSD for each pose as a green line and the descriptor value for the native complex is given as a red line. On the right, the linear fit is given as a red line.
Overall, the current protein descriptors were only able to weakly discriminate between docking decoys, but our results suggest that they may be able to differentiate between func- tionally related proteins. However, as the implementation suggests, the use of the interfacial descriptors requires a three-dimensional structure of both the protein and the ligand in ques- tion; although potentially useful for studies where the interaction is already known, we de- cided to instead focus on generating a set of SNAPP potentials that could be used to evaluate proteins and protein interactions without a priori knowledge of the interface. We propose that the protein descriptors could be further refined for inclusion as a refinement step during protein-protein or protein-peptide docking in future work.
Figure 2.7: The dendogram and heat map of the SNAPP descriptors for the Dockground decoy dataset.