Docking energy for all inhibitors was calculated using MOE software (version 2012.10, Chemical Computing Group Inc. Montreal, Canada). Later, the docking score of inhibitors were used as an additional molecular descriptor by adding these score’s columns to the dataset.
The X-ray structure of the mouse P-gp was obtained from the protein data bank (PDB code 3G60) [http://www.rcsb.org]. The use of this PDB structure was due to a previous docking investigation that showed better scoring poses using mouse 3G60 structure in comparison with the other two mouse P-gp structures (PDB codes: 3G61 and 3G5U), or the human homology model of P-gp (Löschmann et al., 2013). It should be noted that this structure of mouse P-gp was co-crystalised with a ligand and the complex had two stereo-isomers of cyclic hexapeptide inhibitors, cyclic-tris-(R)-valineselenazole (QZ59-RRR) and cyclic-tris-(S)-valineselenazole (QZ59-SSS) in the active site (Aller et al., 2009). The protein was protonated and protonatable residues were titrated using default parameters of the software before the docking exercise. Molecular structures of the ligands (P-gp inhibitors) were optimised after atomic charge calculation using SCF optimization (AM1 Hamiltonian). In enzyme-ligand docking, default parameters of the software were used for ligand interactions. These are energy cut-off for H-bond and ionic interactions of -0.5 kcal/mol and maximum distance for non-bonded interactions of 4.5 Å. In the MOE dock panel, the placement method was Triangle Matcher, the scoring methodology was set to London dG as the first and the second scoring functions, the refinement methodology was set to Forcefield, and finally, the 30 best scoring poses, the mean energies and the mean energies and backbone root mean square deviation (RMSD) were retained. The binding site was defined in MOE software using the co-crystallised ligand QZ59-RRR.
124 Preparation of compounds for Docking
Before docking could take place, the SDF file was imported into the MOE software. MOE is a suite of applications that can be used to manipulate and analyse a collection of compounds. For docking to work efficiently, it is essential that each structure is in a form suitable for it to be docked to a ligand. As a result, the software’s ‘Wash’ application was used to clean the structures and neutralise the protonation state of each compound. This will neutralise all atoms and form the structure of the compound in its least charge-bearing state. The next step was to lower the potential energy of the structures. This was completed using the “Energy minimize” function from the software. The compounds in the database were now ready to be computed and molecular descriptors were calculated.
Validation of docking experiment
The published X-ray crystallography structures (Aller et al., 2009, Gutmann et al., 2010) were used to validate our docking model by comparing the geometries of the docked Abcb1a/QZ59-RRR structure and the structure of the Abcb1a/QZ59-RRR complex from X-ray crystallography and measuring root-mean-square deviation (RMSD) between them.
5.2.3. Model Development and Validation
Development of models for P-gp
To perform QSAR analyses, P-gp inhibitors were divided into validation and training sets. To divide the inhibitors, they were ordered with ascending Ki values,
and then from every five compounds, four were allocated into the training and one into the validation set randomly. This ensured similar Ki ranges for the validation
and training sets. In this way, training data consisted of 176 compounds and external validation set consisted of 43 compounds.
In this study, QSARs were established to relate the P-gp binding effect of compounds (log Ki) to the molecular descriptors and P-gp docking scores.
125
section 3.1. Before building the models, the molecular descriptors were checked to find and discard those columns containing more than 98% constant values or more than 10% missing values. The total number of molecular descriptors used in all statistical analyses was 388.
STATISTICA Data Miner version 11 was used for the statistical analysis. Statistical methods consisted of decision tree methods and ensemble methods including Classification and Regression Tree (C&RT), Chi-square Automatic Interaction Detector (CHAID), Boosted Trees (BT) and Random Forest (RF). Moreover, Multivariate Adaptive Regression Splines (MARS) model was also developed. These methods have been explained in Chapter 3. Log Ki was the
dependent variable and the predictors were selected by the embedded feature selection methods in C&RT, CHAID, BT and RF from all the molecular descriptors and docking scores available for the inhibitors and substrates. In C&RT analysis, several stopping criteria were examined, including the default settings in STATISTICA. The default stopping criteria were minimum number of cases of 24 to allow further splitting, and the maximum number of nodes set to 100. The V- values of 10 or seven was used in the V-fold cross-validation. In CHAID analysis, STATISTICA default setting for stopping criteria were used, including minimum number of cases for splitting of 22, maximum number of nodes of 1000, probability for splitting of 0.05 and probability for merging of 0.05. In BT analysis, the default values for learning rate, the number of additive terms, random test data proportion and subsample proportion were 0.1, 200, 0.2 and 0.5 respectively. Various subsample proportions of 0.45, 0.50, 0.55 and 0.60 were also examined in combination with the learning rates of 0.10, 0.03, 0.05 and 0.08. In RF analysis, various subsample proportions of 0.45, 0.50, 0.55 and 0.60 were examined. The random test data proportion was 0.3 for the internal validation and number of trees was 100. The default settings were used for stopping conditions including minimum number of cases, maximum number of levels, minimum number in child node and the maximum number of nodes of 5, 10, 5 and 100, respectively.
For the development of MARS model, several pre-processing feature selection techniques were examined. Feature selection methods were a Chi-square method as implemented in STATISTICA v11 (StatSoft Ltd.) developed by Hill and Lewicki
126
(Hill and Lewicki, 2006), stepwise regression analysis, and variable importance rank from random forest and boosted trees analyses. The Chi-square-based feature selection in STATISTICA picks a subset of descriptors from the descriptor pool without assuming that the relationships between the predictors and the dependent variables are linear or even monotone. In this feature selection, the range of continuous variable values was divided into 10 intervals. The best variables picked by STATISTICA feature selection, the best descriptors selected by stepwise regression analysis, as well as the top 5, 10, 15, 20 and 25 descriptors picked by RF, and the top 5, 10 and 15 descriptors picked by BT were examined in separate MARS analyses and the resulting models were compared. In MARS analysis, the default model specifications for maximum number of basis functions, degree of interactions, penalty and threshold were 21, 1, 2 and 0.0005 respectively.
The best model from each analytical method was selected based on the performance indicators for the internal validation set.
Development of models for biliary excretion incorporating predicted P-gp activity The selected P-gp dissociation constant (Ki) models above were used to predict the
log Ki values for compounds in biliary excretion dataset (n = 217). QSAR models
were developed for biliary excretion using the dataset and methods explained in Chapter 4. In addition to the molecular descriptors, the P-gp effects predicted by the selected models from section 5.2.3 were used as the independent variables of the analyses. In addition to stepwise regression analysis, C&RT, boosted trees and random forest methods, two additional methods, CHAID, and MARS, were also used for development of QSARs for biliary excretion using the procedure explained above for P-gp models. In some C&RT models, the predicted Ki effects were
manually incorporated in the models, when they were not picked by C&RT feature selection automatically.
127