DE CIENCIAS DE LA EDUCACIÓN DE LA UNIVERSIDAD NACIONAL DEL COMAHUE
2. Algunas características del Plan Ordenanza 1016/93
When proteins bind to one another, a number of possible structural changes can occur. These can be global motion, such as hinge bending, in which domains connected by a flexible region move rigidly relative to one another, or shear motion, in which the interdigitated sidechains of two packed structural elements move parallel along with those elements and repack themselves. Localised rearrangements are more frequent; flexible loops can change their conformation, and sidechains can switch rotamer, or move to a non-rotameric conformation. Accounting for the conformational changes which occur as proteins bind to one another remains a problem in the field of protein-protein docking, and has been subject to a number of recent reviews (May and Zacharias, 2005; Bonvin, 2006; Gray, 2006; Andrusier et al., 2008; Moreira et al., 2010; Zacharias, 2010a; Bastard et al., 2011).
The simplest common method to account for the coformational changes is to use ’soft’ potentials to allow clashes and some interpenetration of surfaces (Palma et al., 2000; Fernández-Recio et al., 2003; Jiang and Kim, 1991; Zacharias, 2003; Katchalski-Katzir et al., 1992; Gabb et al., 1997;
Mandell et al., 2001; Chen and Weng, 2002; Eisenstein and Katchalski-Katzir, 2004; Gardiner et al., 2001; Gray et al., 2003; Schneidman-Duhovny et al., 2004). Indeed, in the first pioneering early studies, either a reduced model (Wodak and Janin, 1978), or a softened Van der Waals term in which the repulsive 1/r12 term is replaced by a 1/r8 term, was used (Levinthal et al., 1975). Soft potentials also reduce the ruggedness, and thus the multi-modality, of the energy landscape, which aids optimisation as it reduces
the number of local minima in which the algorithm can become trapped.
Soft potentials and coarse-graining are simple methods of soft-docking, however three other approaches stand out as particularly noteworthy.
One of these is employed in the GRAMM-X approach (Tovchigrechko and Vakser, 2006). Here, the degree of coarse-graining and smoothing of the potential can be varied, and thus a finer grain can be used when docking high-resolution structures or those deemed to be rigid, whilst a coarser grain can be used for lower resolution structures, homology models or flexible proteins, or for studying low-resolution recognition factors. Another interesting form of coarse-graining is the potential used in the SMOOTHDOCK algorithm (Camacho and Vajda, 2001; Camacho and Gatchell, 2003), where initially electrostatics and desolvation dominates the energy function, and the weight of the Van der Waals interaction is slowly increased as the algorithm proceeds. Thus, the ruggedness of the energy landscape increases as the search is focussed. Finally, the semi-definite programming-based underestimation approach developed by the Vajda lab uses the convex global underestimation method which was originally developed for protein folding (Paschalidis et al., 2007; Shen et al., 2008). In this model, the local energy minima are fitted to a quadratic function, and further sampling is biased towards the minima of this function.
The most basic method to refine structures which are generated with soft potentials is energy minimisation in all degrees of freedom. For instance, following a systematic rigid-body search, Li et al. (2003) tooks a series of three minimisations, first with just Van der Waals, then with Van der Waals and uncharged polar groups and finally with Van der Waals and full electrostatics. This method, however, can only deal with clashes and very minor changes.
A more advanced method of including flexibility is ensemble docking. In this approach, ensembles of structures are docked together. The ensembles can be generated in a number of ways, and there are a number of different ensemble-based approaches. In the cross-docking method, molecular dynamics is performed on the ligand and receptor, the trajectories are clustered and the clusters rigidly docked pairwise to one another (Smith
et al., 2005; Grunberg et al., 2004; Krol et al., 2007a). A similar approach is that of Mustard and Ritchie (2005), in which the program CONCOORD (de Groot et al., 1997) was used to generate an ensemble of structures using pseudo-NMR restraints. Principle component analysis was employed and used to generate ’eigenstructures’, which were subsequently cross-docked.
The other main ensemble based docking method is the mean field approach, in which the whole ensemble is docked simultaneously (Koehl and Delarue, 1994). Take, for instance, two interacting side-chains. Each member of the ensemble of both side chains are weighted equally. Each member of the first side chain ensemble feels the weighted average energy of the second side-chain, and their weights are adjusted such that they follow the Boltzmann distribution. Then the weights of the second side chain are then adjusted so as to follow a Boltzmann distribution in the mean field created by the first side chain. Then, the weights of the first side chain are adjusted again, and the process is repeated iteratively until self-consistency is achieved. This model can be extended without loss of generality. This approach has been used to model side chains in a number of programs, including MultiDock and ATTRACT (Jackson et al., 1998; Zacharias, 2003;
Koehl and Delarue, 1994; Mendes et al., 1999), as well as to model loops in RosettaDock, MC2 and ATTRACT amongst others (Bastard et al., 2003;
Loriot et al., 2011; Chaudhury and Gray, 2008; Bastard et al., 2006). Of course, the ability to accurately model loops and side chains is predicated upon the native conformation residing within the inital ensemble. This inital ensemble can be derived from loop or rotamer databases (Oliva et al., 1997; Michalsky et al., 2003; Wang and Dunbrack, 2003), from molecular dynamics or Monte Carlo simulations, CONCOORD, NMR ensembles, normal mode analysis or from known homologues (Demerdash et al., 2010).
Mean field modelling is commonly applied during a docking procedure, however sometimes loops can be ignored during the docking itself and rebuilt in the post-processing stage (Wang et al., 2007a; Soto et al., 2008).
Others have developed methods of modelling hinge motions, and this is particularly suited to modelling multi-domain proteins. Usually this is done by locating hinge regions either manually or automatically (Emekli et al., 2008), docking both sides of the hinge independently and then
reassembling the complex (Ben-Zeev et al., 2005; Schneidman-Duhovny et al., 2005a, 2007; Sandak et al., 1998b,a; Cheng et al., 2008; Karaca and Bonvin, 2011). The approach detailed in Karaca and Bonvin (2011), which has been implemented in the HADDOCK suite, is of particular interest as it allows the simultaneous modelling of hinge-bending, side-chain and backbone motions. Other interesting approaches to hinge modelling have been outlined by Wang et al. (2007a) and Zhao et al. (2006), in which flexibility is handled by novel data structures, although it remains to be seen whether these approaches can consistently model hinge motion efficiently and accurately.
Another method to account for flexibility is Monte Carlo sampling, in which backbone or side-chain conformational changes are proposed and either accepted or rejected depending on the energy of the new conforma-tion. For instance, side chain Monte Carlo sampling has been implemented in ICM-DISCO (Abagyan and Totrov, 1994; Fernández-Recio et al., 2003) and RosettaDock (Gray et al., 2003). Side-chain rotamer prediction has also been tackled using molecular dynamics (Camacho and Gatchell, 2003; de Vries et al., 2007), genetic algorithms (Tuffery et al., 1991) and neural networks (Hwang and Liao, 1995). Another approach uses graph-theoretical models in which each side-chain is represented using a node, and those with an inter-acting rotamer pair are connected by edges. The graph can be decomposed such that the optimal set of rotamers is derived by combining the optim-ised rotatmer combination corresponding to sub-graphs (Krivov et al., 2009).
Monte Carlo sampling can also be used to model the conformational changes of loops. Fixed-end-moves, the rotation of a number of atoms around two fixed points (Betancourt, 2005), have been observed in crystal structures (Davis et al., 2006) and can capture some known protein motions (Friedland et al., 2009). They have been used to model backbone motions in the RosettaDock program as part of the Monte Carlo move set (Fleishman et al., 2010; Lauck et al., 2010). Modelling backbone flexibility has also been done by varying φ and ψ torsion angles, either using Monte Carlo (Wang et al., 2007a) or by simulated annealing molecular dynamics (de Vries et al., 2007).
The final approach to modelling conformational changes that occur upon binding is to use normal modes. As many protein motions can be approximated using a small number of low frequency modes (see section 1.4.2.3), using a linear combination of normal modes is a promising approach for modelling conformational change. Aside from the approach analysed and implemented during the course of the PhD, and presented in later chapters, two other groups have dynamically adjusted movements along normal coordinates as part of a flexible docking strategy. In the ATTRACT program, the 5 lowest frequency non-trivial normal modes are used as degrees of freedom, along with the translation and orientation (May and Zacharias, 2008a). Quasi-Newton minimisation is performed from multiple starting positions, the proteins are represented using a coarse-grained model and the energy is calculated using a soft Van der Waals potential and electrostatics. The other approach is that used in the FibreDock refinement protocol (Mashiach et al., 2010). In this protocol, the side-chains of rigidly soft-docked poses, derived from another method, are optimised using a linear programming routine (Kingsford et al., 2005).
Then rigid-body minimisation is performed followed by minimisation in normal mode space. In this step, the overlap between pre-calculated normal modes and the forces acting upon the binding partners is used to select 10 modes, and Monte Carlo sampling is undertaken in this normal mode space. Finally, after another round of rigid-body minimisation, the lowest energy solutions are returned.