2.1.1 Data Procurement
Information on proteins of interest was collected from online resources and databases; protein descriptions and domain information from canSAR 2.0160, protein association networks from
STRING v9.1161, phylogenetic trees from the Structural Genomics Consortium website219 and
information regarding known small molecule inhibitors or binders from ChEMBL159.
The complete amino acid sequences of targets were retrieved, in FASTA format, from the UniProt protein sequence database.157
2.1.2 Homology Searches
Utilising the amino acid sequences of targets, the NCBI BLAST180 program was used to search
for:
Human homologues (Database: UniProtKB/Swiss-Prot, Organism: Homo sapiens (taxid: 9606)) Orthologues (Database: UniProtKB/Swiss-Prot)
Relevant crystal structures available in the PDB (Database: Protein Data Bank)
Commonly, BlastP is used to compare the protein query to a protein sequence database. In this work however, DELTA-BLAST was used, which searches a database of pre-constructed PSSMs (Position Specific Scoring Matrices) before searching the protein database, in order to yield better homology detection.183 The complete FASTA sequences of selected hits were downloaded
from the DELTA-BLAST search results.
2.1.3 Sequence Alignments
EMBL-EBI Clustal Omega181,182 was used to compile pairwise and multiple sequence alignments
of protein sequences, utilising amino acid sequences downloaded from UniProt protein sequence database.157
2.1.4 Conservation Scoring
The Scorecons server220 was used to rate the conservation at each amino acid site in pairwise
and multiple sequence alignments of: Human SMC proteins and Rad50 KAT2A and KAT2B HAT domains GNAT and MYST HAT domains
KAT2A and KAT2B bromodomains Type I and type II bromodomains All human bromodomains
Scorecons provides a measure of the site-specific conservation, using a sum-of-pairs scoring system to assign scores ranging from 0 to 1, which depend on the amino-acid frequency and relative stereochemical properties of substituted amino acids.220
2.2 Homology Modelling
2.2.1 Model Building
Modelling was performed using homology modelling programs, SWISS-MODEL179 and
MODELLER 9.12177, which rely on protein sequence alignment, and PHYRE 2.0 (Protein
Homology/analogy Recognition Engine)221 which uses a protein threading software.
SWISS-MODEL179 was used to build one model, Table 2.1.
Table 2.1. Models of Target Homo Sapiens Protein Domains Constructed Using SWISS-MODEL.179
Target Template
Organism Protein & Domain PDB ID Resolution SMC5/6 hinge Thermotoga maritima SMC hinge 1GXJ222 2.00 Å
MODELLER 9.12177 was used to build seven models, Table 2.2.
Table 2.2. Models of Target Homo Sapiens Proteins or Protein Domains Constructed Using MODELLER 9.12.177
Target Template
Organism Protein & Domain PDB ID Resolution SMC5/6 hinge Schizosaccharomyces pombe SMC5/6 hinge 5MG8223 2.75 Å
SMC5/6 head Pyrococcus furiosus SMC ATPase 1XEX224 2.50 Å
NSE1 Homo sapiens NSE1 3NW0 A225 2.92 Å
NSE2 Saccharomyces cerevisiae Mms21 3HTK C226 2.31 Å
NSE3 Homo sapiens MAGEA4 2WA0227 2.30 Å
SMC1/3 hinge Mus musculus SMC1/3 hinge 2WD5228 2.70 Å
SMC1/3 head Pyrococcus furiosus SMC ATPase 1XEX224 2.50 Å
Models were constructed by the satisfaction of spatial restraints, using the ‘automodel’ class. Five similar models were generated using the python script command, additionally assessing the DOPE score189 and GA341 score, and the ‘best’ selected as that with the lowest DOPE score, i.e.
that with the lowest associated energy. Table 2.3 to Table 2.9 show the log file summaries for all the models built. The ‘best’ model, which was further analysed, is highlighted.
Table 2.3. MODELLER 9.12177 Log File Summary for SMC5-SMC6 Hinge Models.
Filename molpdf DOPE Score GA341 Score HSMCAB_hinge.001.pdb 4286.1 -38403.9 1.000 HSMCAB_hinge.002.pdb 4023.1 -38437.8 1.000 HSMCAB_hinge.003.pdb 4131.2 -38639.1 1.000 HSMCAB_hinge.004.pdb 4217.1 -38916.5 1.000 HSMCAB_hinge.005.pdb 4008.4 -38514.0 1.000
Table 2.4. MODELLER 9.12177 Log File Summary for SMC5-SMC6 Head Models.
Filename molpdf DOPE Score GA341 Score HSMC56_head.001.pdb 4875.5 -66034.6 0.921 HSMC56_head.002.pdb 4729.7 -66159.1 0.904 HSMC56_head.003.pdb 5003.4 -65486.4 0.850 HSMC56_head.004.pdb 5396.0 -64320.7 0.769 HSMC56_head.005.pdb 4935.7 -65919.4 0.917
Table 2.5. MODELLER 9.12177 Log File Summary for NSE1 Models.
Filename molpdf DOPE Score GA341 Score HNSE1.B99990001.pdb 1584.1 -27186.2 1.000 HNSE1.B99990002.pdb 1488.4 -27065.1 1.000 HNSE1.B99990003.pdb 1408.9 -27149.6 1.000 HNSE1.B99990004.pdb 1430.3 -27134.1 1.000 HNSE1.B99990005.pdb 1529.6 -27031.7 1.000
Table 2.6. MODELLER 9.12177 Log File Summary for NSE2 Models.
Filename molpdf DOPE Score GA341 Score HNSE2.B99990001.pdb 1360.3 -16265.0 0.448 HNSE2.B99990002.pdb 1470.3 -16566.1 0.845 HNSE2.B99990003.pdb 1371.4 -16093.6 0.522 HNSE2.B99990004.pdb 1315.5 -16638.1 0.462 HNSE2.B99990005.pdb 1401.7 -16193.6 0.485
Table 2.7. MODELLER 9.12177 Log File Summary for NSE3 Models.
Filename molpdf DOPE Score GA341 Score HNSE3.B99990001.pdb 1596.2 -26809.6 1.000 HNSE3.B99990002.pdb 1503.4 -26783.0 1.000 HNSE3.B99990003.pdb 1569.1 -26738.9 1.000 HNSE3.B99990004.pdb 1565.5 -26527.8 1.000 HNSE3.B99990005.pdb 1589.6 -26791.4 1.000
Table 2.8. MODELLER 9.12177 Log File Summary for SMC1-SMC3 Hinge Models.
Filename molpdf DOPE Score GA341 Score HSMCAB_hinge.B99990001.pdb 2628.6 -43777.5 1.000 HSMCAB_hinge.B99990002.pdb 2564.9 -44103.1 1.000 HSMCAB_hinge.B99990003.pdb 2606.3 -43488.2 1.000 HSMCAB_hinge.B99990004.pdb 2658.5 -43959.9 1.000 HSMCAB_hinge.B99990005.pdb 2583.3 -43485.1 1.000
Table 2.9. MODELLER 9.12177 Log File Summary for SMC1-SMC3 Head Models.
Filename molpdf DOPE Score GA341 Score HSMC13_head.B99990001.pdb 4302.1 -77329.9 0.997 HSMC13_head.B99990002.pdb 4004.0 -77646.0 1.000 HSMC13_head.B99990003.pdb 4277.9 -78393.1 1.000 HSMC13_head.B99990004.pdb 4326.0 -77139.1 0.999 HSMC13_head.B99990005.pdb 4146.5 -77750.7 0.987
Phyre 2.0221 was used to build one model, Table 2.10. A model confidence of 100% was quoted.
Table 2.10. Models of Target Homo Sapiens Protein Domains Constructed Using Phyre 2.0.221
Target Template
Organism Protein & Domain PDB ID Resolution SMC5 head Deinococcus radiodurans RecN 4AD8229 4.00 Å
2.2.2 Model Preparation and Minimisation
Preparation and minimisation of protein models was performed using Discovery Studio 4.0.186
The Automatic Preparation function was used to prepare, clean and protonate the model, which was then minimised using the default parameters. Table 2.11 displays CHARMM (Chemistry at HARvard Macromolecular Mechanics) energies calculated for each model at minimisation.230,231
Table 2.11. CHARMM Energies Calculated by Discovery Studio 4.0186 at Model Minimisation.
Model Filename Modelling Software CHARMM Energy HSMCAB_hinge.004.pdb MODELLER -23880.0 HSMC56_head.001.pdb MODELLER -40835.0 HSMC56_head.002.pdb MODELLER -41080.4 HSMC56_head.003.pdb MODELLER -40831.4 HSMC56_head.004.pdb MODELLER -40564.7 HSMC56_head.005.pdb MODELLER -40858.1 HSMC5ab_head_4AD8.pdb PHYRE 2.0 -18068.4 HNSE1.B99990001.pdb MODELLER -16412.2 HNSE2.B99990004.pdb MODELLER -15264.1 HNSE3.B99990001.pdb MODELLER -16664.6 HSMCAB_hinge.B99990002.pdb MODELLER -29243.0 HSMC13_head.B99990003.pdb MODELLER -46803.4