Initially, the models used for protein folding studies were very simple and could not be used to observe atomic-level events of the folding process. Specifically, protein folding was traditionally studied by residue-level models [6]. With these methods the accuracy of results is sacrificed by the reduced level of representation. One example of such a method is the Gō- model which makes use of a potential function that is based on the knowledge of the native
CHAPTER 1. Computational studies of protein folding and aggregation
12
structure of the protein. As a rule, the potential is the sum of two-body terms where if a native contact is formed the result is -1 and 0 otherwise. Specifically, the Gō-model considers only the interactions between residues (beads on the lattice) that are present in the native state, thus only native contacts are taken into account. This interaction model is commonly used in thermodynamic and kinetic studies of protein folding to model amino acid interactions [7, 8].
Over time a more physical model, typically based on polymer-like lattices, was developed and applied to study the static and dynamic properties of protein conformational space [9, 10]. The application of computer simulations using simplified lattice and off-lattice models greatly enhanced our understanding of various aspects of the protein folding problem. The simplicity and computational efficiency of these models made it possible to simulate thousands of folding and unfolding events. Although these models do not represent the full complexity of real proteins, they capture the core aspect of the physical protein folding problem, having the capability of finding the lowest energy state without an exhaustive conformational search. Using these methods detailed statistical description of the folding process in protein models could be obtained, as reviewed by Mirny and Shakhnovich [6].
One of the early proposed mechanisms for protein folding was the nucleation- condensation model [11-13], which was tested in the context of lattice models [13]. The nucleation-condensation model postulated that a small number of residues, referred to as a folding nucleus, need to form their native contacts in order for the folding reaction to proceed to the native state. Based on this model, and its analogy to first-order phase transitions, the concepts of nucleation and free energy landscape have prompted much of the recent progress in understanding the process of protein folding.
Several other models emerged in the attempt to elucidate the protein folding mechanism. One such model is the pathway model, which implies the existence of only one route for a protein to fold or unfold. Based on Levinthal’s suggestion of the existence of a folding pathway which allows proteins to fold in a realistic timeframe, this model satisfies the time constraint. In the pathway model the existence of intermediate states (I) in relation to the folded (N) and unfolded (U) states, are classified as on-pathway [14]:
and off-pathway [14]:
U
I
N
U
I
N
CHAPTER 1. Computational studies of protein folding and aggregation
13
with off-pathway intermediates believed to take no part in the actual folding process. It soon became evident that there are many different pathways through which a protein can fold, each with its own set of transition states. Proteins are generally thought to exhibit funnelled energy landscapes which allow proteins to fold to their native states through a stochastic process in which the free energy decreases spontaneously. The unfolded state, transition state, native state and possible intermediates correspond to local minima or saddle points on the free energy landscape. The concept of “folding funnel” energy landscape was introduced by Leopold et al. [15]. The team investigated the folding of two different 27-unit chains using the lattice modes, and found that one was able to fold and the other was not. Leopold suggested that “convergent kinetic pathways or folding funnels guide folding to a unique, stable native conformation”, thus kinetic trapping prevented the second structure of folding. Four generalised types of funnel energy surfaces that represent possible folding mechanisms of proteins are illustrated in Figure 1.7 (adopted from reference [16]). The figure shows the different energy landscapes: fast folding (simple funnel), kinetic trapping (with one well or rugged with many wells), and slow random searching (golf course).
Figure 1.7 Cartoon representation of the different energy landscapes from a denatured
conformation to the native conformation N, adopted from reference [16]. (a) A smooth energy landscape for fast folding protein; (b) A moat landscape, where folding must go through an obligatory intermediate; (c) A rugged energy landscape with kinetic traps; and (d) A golf course energy landscape in which folding is dominated by diffusional conformational search.
Some of the initial computational investigations of protein folding in atomic detail consisted of energy minimisation methods which were applied to protein structures [17, 18]. This method was followed by molecular dynamics [19, 20] which is used to this day. All- atom protein models with explicit and implicit solvents enabled the investigation of folding thermodynamics and unfolding dynamics of small proteins [21-23]. However, due to the
a)
b)
c)
d)
CHAPTER 1. Computational studies of protein folding and aggregation
14
complexity and large dimensionality of the protein conformational space, all-atom MD simulations have severe limitations on the time and length scales that can be studied (discussed in Chapter 2).
Novel simulation techniques have been developed that improve the conformational sampling efficiency, including biased sampling of the free energy surface and non- equilibrium unfolding simulations. Generalised ensemble sampling techniques, that involve parallel simulation of molecular systems coupled with a Monte Carlo protocol [24] have been successfully applied to investigate protein folding. Using the Replica Exchange Monte Carlo (REMC) method [25], classical Replica Exchange Molecular Dynamics (REMD) [26] and multiplexed REMD [27], the folding dynamics of small proteins were obtained. The methodologies behind some of the more widely used enhanced sampling methods, such as Umbrella Sampling, REMD and the recently developed method Bias-Exchange Metadynamics are discussed in Chapter 2. Excellent reviews on the need for improved conformational sampling and the latest associated methods are presented by the group of Shea and van Gunsteren [28, 29].
Alternatively, to improve conformational sampling a large number of short simulations can also be performed. The world-wide parallel computing network, such as Folding@Home, utilises many processors in a highly heterogenous and loosely coupled distributed computing paradigm to increase the computational efficiency in running MD. Using this approach, Pande and co-workers were able to accumulate hundreds of microseconds of atomistic MD. The folding mechanism and folding rate of several fast- folding proteins and polymers were determined with good accuracy to experimental data [30].
Multi-scale modelling approaches have also been used to combine efficient conformational sampling of coarse-grained models and accuracy of all-atom models to study protein folding pathways. In this approach, iterative simulations and inter-conversion between high and low-resolution protein models are performed. Feig et al. developed a multi-scale modelling tool set, called MMTSB [31]. This method integrates a simplified protein model (lattice-based low resolution conformational sampling) with the Monte Carlo simulation engine, MONSSTER [32], and for all-atom simulations incorporates the MD packages AMBER [33] or CHARMM [34]. Ding et al. reconstructed the transition state ensemble of the src-SH3 protein domain through multi-scale simulations [35].
The protein folding studies can also be facilitated by sampling protein conformations near the native state. Several native-state sampling algorithms, such as normal mode analysis
CHAPTER 1. Computational studies of protein folding and aggregation
15
(NMA) and the structure-based algorithm COREX [36, 37] have been successfully utilised to study plasticity [38], cooperative interactions [39] and allostery [40] in proteins. Native-state ensemble techniques take into account protein flexibility, which is important in biological activity and crucial in structure based drug design.
In the last five years several tools for performing web-based analysis of protein folding dynamics have been developed. The Fold-Rate server (http://psfs.cbrc.jp/fold-rate/) [41] predicts rates of protein folding using the amino-acid sequence. The Parasol folding server (http://parasol.tamu.edu/groups/amatogroup/goldingserver) [42] predicts protein folding pathways using “probabilistic roadmaps”-based motion planning techniques. The iFold server (http://ifold.dokhlab.org) allows discrete molecular dynamics (DMD) simulation of protein dynamics using simplified two-bead per residue protein models [43]. DMD simulations solve ballistic equations of motion with square-well approximation to the inter- particle interaction potentials. DMD approaches [44-46] with simplified structural models of proteins have been extensively used for investigating the general principles of protein folding and unfolding [47-51].
Advances in experimental techniques have also been made, such as protein engineering, nuclear magnetic resonance (NMR), mass spectrometry, hydrogen exchange, fluorescence resonance energy transfer (FRET) and atomic force microscopy (AFM). These techniques have made it possible to obtain detailed information about the different conformations occurring in the folding process [52]. At the same time, computational methods have been developed to better interpret experimental data by using simulations to obtain structural information about the states which are populated during the folding process [53]. The synergy between experiment and theoretical methods is increasing because the timescales and resolutions at which they operate are merging.