• No se han encontrado resultados

Jesús, el pan de vida

ESTANDARES DE LA UNIDAD

Analytical method development (MD) is a key element of any pharmaceutical development program but it is often a time-consuming and labour-intensive process [1-4]. The workflow of systematic chromatographic method development contains two phases: scoping

and optimisation [5-7]. As the primary phase of method development, scoping involves the selection of the preferred chromatographic technique, stationary phase, and broad composition of mobile phase. A subsequent phase, called optimisation, can then be performed to optimise and fine tune the selected chromatographic conditions by implementing experimental design approaches [5, 8]. An HPLC analysis method is developed to identify, quantify or purify compounds of interest, thus successful MD requires the experience and expertise of the chromatographer [9-11].

Computer-aided MD has been given intensive study as it can accelerate the MD process significantly if sufficiently accurate retention models exist [12, 13]. Nowadays, a wide variety of in silico tools have been used to speed up chromatographic MD based upon predicting retention from chemical structures [14, 15]. Commercial software such as Drylab (Molnár- Institute for Applied Chromatography, Berlin, Germany), ChromSword (ChromSword, Riga, Latvia) and ACD/ChromGenius (ACD/Labs, Toronto, Canada) have been utilised in chromatographic method development, optimisation, and validation [14, 15]. These software packages rely on experimental retention data and calculations for the chemical structures of compounds to predict retention under the same chromatographic conditions that have been used for compounds in the embedded database. In addition, when using these computer-aided MD software packages, the systematic experimentation that is undertaken also allows the implementation of Quality-by-Design (QbD) practices by providing tools to improve the robustness of the chosen chromatographic method and reduce labour and solvent consumption by limiting the required number of experiments [16, 17].

For the scoping phase of computer-aided MD a prediction error of up to 10% can be tolerated for the purpose of choosing some broad method parameters [4, 18, 19], including the most suitable chromatographic technique (such as reversed-phase [RPLC] [20, 21], hydrophilic interaction liquid chromatography [HILIC] [22], ion chromatography [IC] [8] or supercritical fluid chromatography [SFC] [23]), the stationary phase and mobile phase, etc. While for optimisation, prediction errors as low as one or two percent are needed to be able to find the optimal conditions for the separation of given compounds [24]. Quantitative Structure-Retention Relationship (QSRR) methodology has the potential to speed up the

58

scoping phase as exploratory experimentation can be replaced by retention prediction based solely on the chemical structures of molecules. Then, the best starting point for the

optimisation phase of MD can be selected, after comparing retention prediction across a broad range of chromatographic techniques for a group of compounds [5, 25, 26]. The optimisation

phase will always involve detailed experiments to measure retention accurately [27, 28]. QSRR provides a tool to generate more extensive information for retention phenomena including mechanism investigation, retention prediction and method development in chromatography [29, 30]. A QSRR model is usually created from a set of descriptors, either experimentally determined or theoretically computed from a symbolic representation of the molecules using commercial software tools [5, 30]. For some simple QSRR models, a limited number of pre-selected physico-chemical parameter descriptors are used. Examples can be found in ChromSword software where descriptors of the molecular volume and the energy of interaction with water are employed [11, 31]. Another software tool, ACD/ChromGenius uses more parameters such as log P, the log of the compound distribution coefficient (log D), polar surface area, molecular volume, molecular weight, molar refractivity and the number of hydrogen bond donor and acceptor sites on the molecule as descriptors to build QSRR models [5, 32].

As an alternative, a large pool of molecular descriptors can be generated using commercial software, such as Dragon [5, 8, 22, 33]. However, this should be followed by an appropriate variable selection strategy to extract the most relevant and informative descriptors for their subsequent use in QSRR modelling. A QSRR model with too few descriptors could be under- fitted and hence be insufficiently predictive, but a model with too many descriptors can increase the risk of over-fitting and introduce noise [1, 5]. Considering the large number of descriptors generated, implementing a suitable variable selection method, such as a genetic algorithm (GA), which is often combined with multiple linear regression (MLR) or partial least squares regression (PLS), becomes necessary to exclude noise from the model and reduce the risk of over-fitting and chance correlation [12, 15]. PLS regression is particularly useful in the presence of co-linear, redundant and noisy variables, and in handling databases with a high number of variables compared to the number of sample compounds [5, 34]. It has been shown that the performance of PLS modelling can be improved significantly by applying a suitable feature selection method [34, 35]. As reported by P Žuvela et al. [36], a combination of GA and PLS was best for selecting the most important and relevant descriptors compared to other optimisation algorithms in terms of computational cost, accuracy and robustness of the constructed QSRR models.

59

QSRR models either can be built using the whole dataset (all compounds except the target compound), or a group of compounds from that dataset, as the training set [18].The use of the whole dataset as training set is popular in QSRR modelling, but the drawback of this approach is that, most of the time, the accuracy of prediction is unsatisfactory [18, 37]. The use of a specific training set of compounds can improve the accuracy of prediction results as the concept of similarity is often used to form training sets. It has been shown that smaller, more similar training sets to the target compound lead to greater prediction accuracy [18, 38, 39]. Similar compounds to the target can be selected using some criterion, such as the similarity of chemical structures between molecules, the proximity of physico-chemical properties, or the retention parameters of the compounds of interest.

In the present study, the ratio of retention factor was used as a chromatographic similarity filter to yield training sets for the construction of QSRR models. Furthermore, the Tanimoto approach, which can be seen the gold standard in computing fingerprint-based similarity, was also investigated and compared as a filter for building training sets for QSRR modelling. In addition, in order to find a chromatographic similarity index which is comparable with the k- ratio filter, the representative molecular descriptors log D and log P were also explored as filters for the training sets. Finally, the effectiveness of a dual filter that uses Tanimoto or log D as the primary, and the k-ratio as the secondary filter was also evaluated.