Jesús, el pan de vida
LOS VALORES QUE ENSEÑA LA DOCTRINA SOCIAL DE LA IGLESIA
Two types of models can be generated in a QSRR study. The first is a global model, where a single model is built for the retention prediction of all compounds. The second is a local model, where a new model is derived for each new compound for which retention is to be predicted. The global modelling approach is popular in QSRR modelling because of its simplicity, but its major drawback is that the accuracy of prediction is generally low. On the other hand, local modelling where each compound has its own specific model, generally provides higher prediction accuracy. For both modelling approaches, a suitable training set is critical as this plays a major role in the prediction performance of the constructed QSRR model. Compounds in training sets can be selected either randomly or using targeted strategies to identify the best training set. The latter approach can be described as database “filtering” and this term is used throughout this thesis to describe the process of identifying the most appropriate compounds to be used in the training set. For example, filtering can be performed using the concept of structural similarity between the target analyte and the database compounds, based on the premise that a training set comprising structurally similar compounds would give a more accurate QSRR model than if the training set compounds had been selected randomly. Other parameters which can form the basis of filtering methods include analyte physico-chemical parameters (for example, log D which reflects the hydrophobicity), the nature of compounds (acids, bases, and neutrals), or the retention of compounds (retention time, retention factor). A more complex filtering approach using the second dominant interaction between compounds and stationary phase after hydrophobicity is another option. Regardless of which filtering method is applied, finding the final training set which provides the highest prediction accuracy is the ultimate goal.
2.3.6 Model validation
In QSRR modelling, a training set is used to build QSRR models using the most informative molecular descriptors, selected by the GA, and a test set is needed for validation [39-42]. Also, to evaluate the predictive ability of the constructed QSRR models, a separate external set is required [26, 43]. For this purpose, the measured chromatographic retention data of the test compounds were extracted and compared with their predicted retention data calculated from derived QSRR models. To generate test sets, a D-optimal algorithm was employed to split compounds in the dataset into a training set and a test set, respectively.
In this work, the coefficient of determination (R2), the slope of the regression with no
forced intercept, the mean absolute error (MAE) and the root-mean-square error of prediction (RMSEP) were utilised to evaluate model fitness and the predictive ability of the constructed QSRR models, with the requirement for the slope to be within the range of 0.85 to 1.15 [26,
53
34, 35]. The percentage root-mean-square error of prediction (RMSEP%) of retention time for the test set was measured to externally validate the accuracy of GA-PLS models generated from the training set.MAE was defined as:
MAE =1 𝑛∑ │
𝑛 𝑖=1
𝑦𝑖 − 𝑦̂𝑖│ 2.1
Where the yi and 𝑦̂𝑖 are, respectively, the experimental and predicted values of the response for the i-th compound in the dataset, and n is the number of compounds.
RMSEP was defined as:
𝑅𝑀𝑆𝐸𝑃 = √∑ (𝑦𝑖(𝑒𝑥𝑝) − 𝑦𝑖(𝑝𝑟𝑒𝑑))
𝑛
𝑖=1 2
𝑛 2.2
Where the yi(exp) and yi(pred) are, respectively, the experimental and predicted values of the response for the i-th compound in the dataset, and n is the number of compounds.
%RMSEP was defined as:
%𝑅𝑀𝑆𝐸𝑃 =√∑ (𝑦𝑖(𝑒𝑥𝑝) − 𝑦𝑖(𝑝𝑟𝑒𝑑)𝑦𝑖(𝑒𝑥𝑝) ) 𝑛 𝑖=1 2 𝑛 × 100 2.3
Where the yi(exp) and yi(pred) are the experimental and predicted retention times of the response for the i-th compound, and n is the number of compounds.
In Chapter 4, several filtering approaches were employed to generate training sets for the construction of QSRR models, and the predictive ability of the yielded models was evaluated by inspecting the Regression Error Characteristic (REC) curves obtained by plotting the prediction error range against the percentage of data points predicted within that range [44]. Furthermore, the overall performance of the above constructed models was further compared using the sum of ranking difference (SRD) approach where parameters for each model were compared to a series of reference values, and each model ranked according to how large was the difference between its parameters and the reference values [45, 46]. The rankings were also compared to a confidence interval generated by using randomly ranked numbers [45, 46]. More detail can be found in Chapter 4.
2.4 References
1. Wilson, N., M. Nelson, J. Dolan, L. Snyder, R. Wolcott, and P. Carr, Column selectivity in reversed-phase liquid chromatography: I. A general quantitative relationship. Journal of Chromatography A, 2002. 961(2): p. 171-193.
54
2. LC Tan, PW Carr, and M. Abraham, Study of retention in reversed-phase liquid chromatography using linear solvation energy relationships I. The stationary phase.Journal of Chromatography A, 1996. 752: p. 1-18. 3. University of Minnesota - Boswell Research Group,
http://www.hplccolumns.org/database/index.php.
4. Hall, L.M., D.W. Hill, L.C. Menikarachchi, M.-H. Chen, L.H. Hall, and D.F. Grant,
Optimizing artificial neural network models for metabolomics and systems biology: an example using HPLC retention index data. Bioanalysis, 2015. 7(8): p. 939-955.
5. MarvinSketch. ChemAxon, (2016), chemaxon.com.
6. Halgren, T.A., Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. Journal of Computational Chemistry, 1996. 17(5‐6): p. 490- 519.
7. Halgren, T.A., Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions. Journal of Computational Chemistry, 1996. 17(5‐6): p. 520-552.
8. Halgren, T.A., Merck molecular force field. III. Molecular geometries and vibrational frequencies for MMFF94. Journal of Computational Chemistry, 1996. 17(5‐6): p. 553- 586.
9. Halgren, T.A., Merck molecular force field. V. Extension of MMFF94 using experimental data, additional computational data, and empirical rules. Journal of Computational Chemistry, 1996. 17(5‐6): p. 616-641.
10. Halgren, T.A. and R.B. Nachbar, Merck molecular force field. IV. Conformational energies and geometries for MMFF94. Journal of Computational Chemistry, 1996. 17(5‐ 6): p. 587-615.
11. Vainio, M.J. and M.S. Johnson, Generating conformer ensembles using a multiobjective genetic algorithm. Journal of Chemical Information and Modeling, 2007. 47(6): p. 2462- 2474.
12. MOPAC (2012). Stewart Computational Chemistry, Colorado Springs: CO, USA, OpenMOPAC.net.
13. Becke, A.D., A new mixing of Hartree–Fock and local density‐functional theories. Journal of Chemical Physics, 1993. 98(2): p. 1372-1377.
14. Hammer, B., L.B. Hansen, and J.K. Nørskov, Improved adsorption energetics within density-functional theory using revised Perdew-Burke-Ernzerhof functionals. Physical Review B, 1999. 59(11): p. 7413.
15. Yang, W. and P.W. Ayers, Density-functional theory, in Computational Medicinal Chemistry for Drug Discovery. 2003, CRC Press. p. 103-132.
16. in, Talete srl, Dragon 6.0 for Windows (Software For Molecular Descriptor Calculations);
http://www.talete.mi.it/Talete, Milano, Italy.
17. Matlab,in The Mathworks Inc., Natick, MA, USA, 2013.
18. Katkova, E.V., I.V. Oferkin, and V.B. Sulimov, Application of the PM7 quantum chemical semi-empirical method to the development of new urokinase inhibitors. Vychisl. Metody Programm, 2014. 15(2): p. 258-273.
19. Lee, C., W. Yang, and R.G. Parr, Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Physical review B, 1988. 37(2): p. 785. 20. Stephens, P., F. Devlin, C. Chabalowski, and M.J. Frisch, Ab initio calculation of
vibrational absorption and circular dichroism spectra using density functional force fields.
The Journal of Physical Chemistry, 1994. 98(45): p. 11623-11627.
21. Frisch, M.J., J.A. Pople, and J.S. Binkley, Self‐consistent molecular orbital methods 25. Supplementary functions for Gaussian basis sets. Journal of Chemical Physics, 1984. 80(7): p. 3265-3269.
22. Tomasi, J., B. Mennucci, and R. Cammi, Quantum mechanical continuum solvation models. Chemical Reviews, 2005. 105(8): p. 2999-3094.
55
23. Tyteca, E., S.H. Park, R.A. Shellie, P.R. Haddad, and G. Desmet, Computer-assisted multi-segment gradient optimization in ion chromatography. Journal of Chromatography A, 2015. 1381: p. 101-109.24. Tyteca, E., M. Talebi, R. Amos, S.H. Park, M. Taraji, Y. Wen, R. Szucs, C.A. Pohl, J.W. Dolan, and P.R. Haddad, Towards a chromatographic similarity index to establish localized quantitative structure-retention models for retention prediction: use of retention factor ratio. Journal of Chromatography A, 2017. 1486: p. 50-58.
25. Talebi, M., S.H. Park, M. Taraji, Y. Wen, R.I. Amos, P.R. Haddad, R. Shellie, R. Szucs, C. Pohl, and J.W. Dolan, Retention time prediction based on molecular structure in pharmaceutical method development: A perspective. LCGC North America, 2016. 34(8): p. 550-558.
26. Wen, Y., M. Talebi, R.I. Amos, R. Szucs, J.W. Dolan, C.A. Pohl, and P.R. Haddad,
Retention prediction in reversed phase high performance liquid chromatography using quantitative structure-retention relationships applied to the Hydrophobic Subtraction Model. Journal of Chromatography A, 2018. 1541: p. 1-11.
27. Clementi, S., G. Cruciani, P. Fifi, D. Riganelli, R. Valigi, and G. Musumarra, A new set of principal properties for heteroaromatics obtained by GRID. Molecular Informatics, 1996. 15(2): p. 108-120.
28. Cruciani, G., M. Pastor, and S. Clementi, Handling information from 3D grid maps for QSAR studies, in Molecular modeling and prediction of bioactivity. 2000, Springer. p. 73- 81.
29. Cruciani, G., P. Crivori, P.-A. Carrupt, and B. Testa, Molecular fields in quantitative structure–permeation relationships: the VolSurf approach. Journal of Molecular Structure: THEOCHEM, 2000. 503(1): p. 17-30.
30. Cruciani, G., M. Pastor, and W. Guba, VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. European Journal of Pharmaceutical Sciences, 2000. 11: p. S29-S39.
31. John, H., Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. 1992, MIT Press, Cambridge, MA.
32. Leardi, R., Application of genetic algorithm-PLS for feature selection in spectral data sets.
Journal of Chemometrics, 2000. 14(5-6): p. 643-655.
33. Leardi, R. and A.L. Gonzalez, Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemometrics and Intelligent Laboratory Systems , 1998. 41(2): p. 195-207.
34. Taraji, M., P.R. Haddad, R.I. Amos, M. Talebi, R. Szucs, J.W. Dolan, and C.A. Pohl,
Prediction of retention in hydrophilic interaction liquid chromatography using solute molecular descriptors based on chemical structures. Journal of Chromatography A, 2017. 1486: p. 59-67.
35. Park, S.H., P.R. Haddad, M. Talebi, E. Tyteca, R.I. Amos, R. Szucs, J.W. Dolan, and C.A. Pohl, Retention prediction of low molecular weight anions in ion chromatography based on quantitative structure-retention relationships applied to the linear solvent strength model. Journal of Chromatography A, 2017. 1486: p. 68-75.
36. Talebi, M., G. Schuster, R.A. Shellie, R. Szucs, and P.R. Haddad, Performance comparison of partial least squares-related variable selection methods for quantitative structure retention relationships modelling of retention times in reversed-phase liquid chromatography. Journal of Chromatography A, 2015. 1424: p. 69-76.
37. Varmuza, K., P. Filzmoser, and M. Dehmer, Multivariate linear QSPR/QSAR models: Rigorous evaluation of variable selection for PLS. Computational and Structural Biotechnology Journal, 2013. 5(6): p. e201302007.
38. Varmuza, K. and P. Filzmoser, Introduction to multivariate statistical analysis in chemometrics. 2016: CRC press.
56
39. Ghasemi, J. and S. Saaidpour, QSRR prediction of the chromatographic retention behavior of painkiller drugs. Journal of Chromatographic Science, 2009. 47(2): p. 156- 163.40. Goryński, K., B. Bojko, A. Nowaczyk, A. Buciński, J. Pawliszyn, and R. Kaliszan,
Quantitative structure–retention relationships models for prediction of high performance liquid chromatography retention time of small molecules: endogenous metabolites and banned compounds. Analytica Chimica Acta, 2013. 797: p. 13-19.
41. Héberger, K., Quantitative structure–(chromatographic) retention relationships. Journal of Chromatography A, 2007. 1158(1-2): p. 273-305.
42. Žuvela, P., J.J. Liu, K. Macur, and T. Baczek, Molecular descriptor subset selection in theoretical peptide quantitative structure–retention relationship model development using nature-inspired optimization algorithms. Analytical Chemistry, 2015. 87(19): p. 9876- 9883.
43. Taraji, M., P.R. Haddad, R.I. Amos, M. Talebi, R. Szucs, J.W. Dolan, and C.A. Pohl, Use of dual-filtering to create training sets leading to improved accuracy in quantitative structure-retention relationships modelling for hydrophilic interaction liquid chromatographic systems. Journal of Chromatography A, 2017. 1507: p. 53-62.
44. J. Bi, K.P. Bennett, Regression Error Characteristic Curves, Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003, pp. 43-50.
45. Héberger, K., Sum of ranking differences compares methods or models fairly. Trends in Analytical Chemistry, 2010. 29(1): p. 101-109.
46. Héberger, K. and K. Kollár‐Hunek, Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers. Journal of Chemometrics, 2011. 25(4): p. 151-158.