www.bsc.es
Simulation environment for Life Sciences
BSC – 14th, 15th March 2022
Objective
Overview of molecular simulation technologies used in Life Sciences and their specific adaptation to
HPC environment.
Aknowledgements
OUTLINE
• Macromolecular dynamics. A key issue to understand molecular recognition
– Why dynamic properties?
– The concept of “ensemble”
• Molecular simulations
– Levels of representation. Atomistic vs. Coarse-grained – Molecular dynamics algorithm(s)
– System preparation (+ Hands on) – Trajectory Analysis (+ Hands on)
• Simulation & HPC
– Algorithmic improvements
– Ensembles and replicated simulations – Simulation Databases
– Data management
Objectives and outline
14- March
09.00 - 10.30 Welcome & Introduction (JLG) 10.30 - 11.00 Break
11.00 - 11.45 Atomistic MD (JLG)
11.45 - 12.30 Improvements & HPC (JLG) 14.00 - 15.15 Simulation Setup (JLG)
15.15 - 15.30 Software installation (AH) 15.30 - 16.00 Break
16.00 - 18.00 Setup Hands On (AH)
15-March
09.00 - 10.30 Simulation DBs and Data Mgt. (PA) 10.30 - 11.00 Break
11.00 - 11.45 Application Examples (MW) 11.45 - 12.30 CG simulations (PD)
12.30 - 14.00 Break
14.00 - 16.00 Traj. visualization and analysis (AH) 16.00 - 16.30 Break
16.30 - 18.00 Report writing
Course Materials
• https://inbi-login.bsc.es/www/patc/
• Software to be installed locally:
– Linux O.S, VMD, python, conda – VM available
• Accounts on Minotauro (mt1.bsc.es) to execute simulations
– 64 Bull Blade B505 (62 + 2 login),
• 2 Intel E5649 (6-core/2.53GHz/12MB cache), 24GB RAM, 2 NVIDIA M2090, 1x SSD 250GB
– nct0XXX (XXX = 178 .. 195), pass: 4FSCn5.XXX – Access via ssh (sftp, scp)
MACROMOLECULAR DYNAMICS A KEY ISSUE FOR MOLECULAR RECOGNITION
DNA sequence Protein sequence
ATP (Mg) - ACV
ATP (Mg) - ACV
Structural rearrangement is necessary
for enzyme (protein) function
Macromolecules are dynamic entities
• Molecular recognition requires structural
adjustment
Binding modes
for Rofecoxib and Celecoxib to cyclooxigenase-2
Soliva R. et al. J Med Chem. 2003, 46 (8), pp 1372–1382
Dynamic properties. Why?
• Docking experiments are very sensitive to receptor structure
Acetylcholinesterase
RMSd docking solutions
-0,40 0,00 0,40 0,80 1,20
0,15 0,2 0,25 0,3 0,35 0,4 0,45 Comp 1
Comp 2 VDW
+ -
Component 2
-1 -0.5 0 0.5 1 -1
-0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
A B
C
E
H F D
G
1dbs 1byi
1phb
1bty 9xia
1xih
Carlson H.A. & McCammon J.A. (1999) Mol. Pharmacol. 57, 213-218
The concept of ensemble
• “Ensemble”: set of structures that represents ALL possible microscopic states of the system (or a significant sample of them)
• Thermodynamic properties can be deduced from the average of “ensemble”
properties.
Experimental ensembles?
Available structural information is “static”
– X-Ray: Macromolecules must have unique conformations to crystalize. Mobile regions do not have enough electron density to be detected.
• “Experimental” flexibility is given by B-Factors
– NMR: Is “in solution”, however conformation cannot vary too much, otherwise no enough restrains can be derived from experiment. Mobile regions are less defined.
ATOM 2537 CA GLY B 27 54.322 -6.951 4.465 1.00 48.11 C ATOM 2538 C GLY B 27 53.220 -7.408 5.430 1.00 23.01 C ATOM 2539 O GLY B 27 52.901 -6.745 6.433 1.00 17.59 O
Experimental ensembles
Xray: 1CM8 and other Prot. Kinases NMR: 1A03. Ca2+ Binding protein
Theoretical ensembles
• Atomistic Molecular Dynamics is the most used theoretical technique to account for dynamic properties of macromolecules
– Also Monte-Carlo, Normal mode analysis, Discrete dynamics, …
• Analysis of a single system along time is equivalent to the analysis of many copies of the same system (ergodic principle)
• Simple theoretical background.
• Fast to calculate even for big systems
Theoretical ensembles (MD and others)
Integration step 1 fts (10-15 seg) → 1 mseg = 1 Eur Billion integration steps Equivalent to follow evolution Neanderthal→ H sapiens with photos every sec
𝑓
𝑖= − 𝜕𝐸
𝑖𝜕𝑟
𝑖𝑎
𝑖= 𝑓
𝑖ൗ
𝑚
𝑖𝑣
𝑖= න 𝑎
𝑖𝑑𝑡
𝑟
𝑖= න 𝑣
𝑖𝑑𝑡
Current limits
• Standard simulations are in ns-ms timescale with 100,000 to 500,000 atoms
• World Records:
– HIV Capsid 64 M atom 1.2 ms – BPTI 1 ms free simulation
Applications
• Structure optimization
– Refinement of XRay and NMR data.
• Conformational transitions and folding
• Flexibility studies
• Dynamics and function
– Allostery, induced fit
• Generation of theoretical ensembles
– Statistical thermodynamics
Predicted vs X-Ray Structures
0,00 1,00 2,00 3,00 4,00 5,00 6,00
0 200 400 600 800 1000
time (ps)
RMSd (A)
Protein
Loop 214-232
Conformational changes
TK - ATP (Mg) - ACV
Wat
Wat
0,0 1,0 2,0 3,0 4,0
0 500 1000 1500 2000
Time (ps)
RMSd (A)
ACV Mg
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
0 500 1000 1500 2000
Time (ps)
Dist (A)
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
0 500 1000 1500 2000
Time (ps)
Dist (A)
Some phenomena can only be
understood
fromdynamics
!HBPG Aciclovir
Thymidine kinase catalysis
Statistics on MD ensembles
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
0 500 1000 1500 2000
Time (ps)
Dist (A)
State A State C
A B B
A N
RT N G = − ln
→
Statistics replaces rational thinking!!
How does the enzyme control ligand access to the heme site?
Long branch (20Å)
B: Ile19, Ala24, Ile25, Val28, Val29, Phe32 E: Phe62, Ala63, Leu66
E-helix B-helix
H-helix G-helix
Short branch (10Å)
0.1 ms MD
AMBER 99 force field
Octahedral box (8500 TIP3P waters)
CMIP energy isocontour
Closed State Axis of the tunnel
Phe62
CMIP energy isocontour
Open State Axis of the tunnel
closed
Phe62
open
Free energy profile for NO diffusion along the main channel (steered MD simulations)
Heme
Free energy (kcal/mol)
Distance Fe-N(NO) (Å) Closed state
Open state
Molecular dynamics and HPC
• Present biology requires “all” to go High-Throughput
– Genome/Proteome-wide studies
– Plethora of genomic information available – Towards a “Dynamic” PDB
• Biological scales are already there…
Molecular dynamics and HPC. The bad news…
• MD algorithms are extremely simple. Several ways to paralelise, but trivial ones do not scale beyond 8-16 threads
• Present supercomputers have several thousands of cores available
– first computer (without GPUs) on the Top500 list holds 7,630,848 cores (Fugaku, Japan), second 10,649,600 cores (Sunway TaihuLight, China).
– with GPUs TOP1 2,414,592 cores (Summit, USA)
• (www.top500.org).
– Applications to get CPU time require to justify a fair use of a large amount of cores.
– No present strategy for MD optimization can scale in practical use to thousands of cores!!
• Rough rule: not less then 100 atoms per core.
Strategies
Increase granularity
• Coarse-grained simulations
Improve parallelization strategies
• Domain decomposition
• Accelerators (GPU’s )
Replicated simulations
• Reduce process intercommunication.
Massively parallel execution with biobb and PyCOMPSs
• Pyruvate Kinase
~400.000 atoms MD
• 200 annotated mutations from Uniprot
• ~40,000 cores in BSC Marenostrum