WEKA - HERRAMIENTAS DE MINERIA DE DATOS

2. Marco Teórico

2.4. HERRAMIENTAS DE MINERIA DE DATOS

2.4.1. WEKA

Once the wavefunction has been obtained, it is possible, in principle, to calculate any observable property, O, of a molecule by using the correct operator,.O, as the solution to an eigenvalue problem:

O T = O T

where the property is given by the expectation value of the operator, o. One such property is the energy of the molecule, obtained from the Hamiltonian operator as already shown.

In the case of one-electron properties, such as the electrostatic potential, the expectation value of the one-electron operator can be calculated from the Density Matrix and the easily calculated one-electron integrals.

< 0 >

(2.41)

which are calculated as part of the SCF calculation. For this reason many one-electron properties are calculated by default in the ab-initio and semi-empirical packages.

2.11.1 Geometry Optimisation.

The energy of a particular molecular conformation is a two-electron property of the wavefunction, the operator being the Hamiltonian Operator, and though the minimum energy conformation is related, it is not an observable as there is no operator which will give that conformation directly from the wavefunction. However, the minimum energy conformation can be obtained from a consideration of the forces acting on an isolated molecule. If the forces acting on the molecule are reduced to zero, that molecule will be in a minimum-energy conformation. The method of Pulay [6 6],

allows the forces exerted on the molecule to be determined analytically, and are given by the negative of the first derivative of the energy expression with respect to the nuclear coordinates. The geometry is altered, using one of several minimisation techniques, to reduce these derivatives, leading to a stationary point on the potential energy surface. In order to determine the nature of this stationary point, the second derivatives are also required; for a minimum, they must all be positive, but for a transition state, one of the second derivatives will be negative, which can be utilised in locating such structures.

The gradient methods fall into two basis classes. Steepest Descent Method and Conjugate Gradients Method, both of which use a first-order Taylor expansion of the

energy term:

E(x+Ax) = E(x) + g(Ax)

(2.42)

Steepest descent methods take advantage of the fact that the greatest change in E (AE) is obtained when Ax is in the direction of -g (the gradient), so Ax is obtained thus:

\ = \-i + Vk

(2-43)

where is the step size, which is set initially and then reduced if at any stage, the energy increases. This method produces a rapid reduction in the value of a large gradient, but converges very poorly. The Conjugate Gradient method converges much more rapidly as it is able to locate the minimum energy along a given displacement vector, thus for an n-dimensional surface, it will converge in a maximum of n steps.

The BERNY [67] algorithm of Schlegel has been used as the default in the GAUSSIAN [56] package. First derivatives are calculated directly and though use is also made of the second derivative (Hessian) matrix, an initial estimate is required. The Hessian can tlien be updated at each step in which case the methods behaves as a conjugate gradient method, but if the magnitude of the gradients aie large it may be preferable not to update the Hessian until the gradients have dropped below a suitable threshold, in which case the method acts as a steepest descent minimisation.

2.11.2 Electrostatic Potential Maps.

The electrostatic potential at a point is a physical observable, measurable by electron scattering experiments,and is defined as the force exerted on a unit positive charge at that point [6 8]. As such it is a one electron propeity and so can be calculated

from the wave-function.

The electron density is given in eqn. 2.29, which when combined with the

nuclear charge gives the net charge on atom, a, as :

q,

=

Z, - J p(r)

dr

(2.44)

SO, by definition, the electrostatic potential (ESP) at point, r, is:

VW = E

- J

i M -

(2-«)

The magnitude of the ESP at a given point clearly depends on its position in relation to the molecule in question, thus it is useful to calculate the ESP over a plane containing the molecule (particularly if the molecule is planar) or over the Connolly surface [69,70] of the molecule. When the ESP is plotted onto the surface, features of the ESP can be associated with specific regions of the molecule, while plotting the ESP onto a plane gives an indication of the direction of forces acting on an approaching molecule.

The calculation of the ESP from the wavefunction is a very time-consuming process as the wavefunction must be calculated at each point for which the ESP is to be plotted. If this ESP is plotted onto a Connolly surface of the molecule, a point density between 1 and 5 points Â-2 may be required, though rather less if the ESP is plotted over a grid. Thus for a large system, it is necessary to find an approximate means of calculating the ESP. To this end, studies have been carried out calculating the ESP with the atoms represented as point charges, with little apparent effect on the ESP outside the Van der Waals radius [71]. Thus this method has been used to save on computer time. In this case;

where takes into account the electron density around atom, a.

A pictorial representation of the Molecular Electrostatic Potential (MEP) can yield considerable qualitative information about the reactivity of that molecule. As the MEP represents the force on a unit-positive point-charge, regions of positive ESP are likely to be subject to electrophilic attack, while the reverse is true for nucleophilic attack. The MEP can also give insights into the binding of small molecules to their receptors in that regions of appreciable ESP will bind to complementary regions within the receptor binding site. As ligand-receptor binding is generally a non-bonding process, such electrostatic interactions are the main energetic contribution to binding. Thus if the sti ucture of the binding site on the receptor is known, it should be possible to design ligands of high affinity, conversely, a possible structure of a binding site can be buüt up from comparisons of ESP from ligands of known affinity. Whether it is obtained from the wavefunction of from point charges, the ESP does show varying degrees of basis set dependency, so any attempts to draw qualitative conclusions from tire MEP must be done witlr caution.

2.11.3 Similarity Indices.

The idea of a quantitative measure of simihuity was developed by Carbo [72] and expressed in terms of electron density overlap between two molecules A and B, such tliat:

j Pa Pb dv

Pa ( j Pb dv)'

where pA and pg are the electron densities of atoms A and B. This gives a range of similarity from 0 to 1 (dissimilar to identical), though only compares the shape of the charge distribution not its magnitude. Thus if Pa = npg the Carbo similarity

index remains unity. The magnitude of the electron density is taken into account in the alternative description of Hodgkin [73]:

zJ p aPb dv

J

P a

dv

J

p | dv

; — ( 2 . 4 8 )

This index also ranges from zero (dissimilar) to unity (identical), but for pA = npB, Hab = 2n / (l+n^).

Calculation of similarity indices is not restricted to electron density, but can also be performed using the electrostatic potential or electrostatic field of the molecules. The only difference in calculating similarities from these measures rather than electron density is that they are signed so the similarity index runs from -1 to +1, a negative index indicating that the magnitude of the field or potential is similar, but that the sign is reversed. In practice, both the field and potential are calculated using the point charge approximation described in section 2.11.2, in which case the field is given by:

X q (x - r )

F(r) =

(2.49)

I r - r. I

j '

The quantities can then be calculated over a grid though it is necessary to exclude the volume occupied by the molecules themselves as the approximation is not valid inside the Van der Waals radius. It is then a simple matter to sum the similarities over the grid point to provide a global value for the similarity index.

2.11.4 Atomic Charges.

Atomic charge is not a physical observable even though its effects can be seen in the electrostatic potential, and the susceptibility of parts of a molecule to nucleophilic or electrophilic attack. The atomic charges can be calculated to reproduce the molecular electrostatic potential at a specified surface, usually a Connolly surface set at a multiple of the Van der Waals radius, or can be calculated via Population Analysis. Calculating atomic charges from the electrostatic potential, ESP, should give the most accurate charges, but is computationally expensive. The method requires that the ESP be calculated at many points and the charges are obtained by a least squares fit of the point charge ESP to the wavefunction ESP. The charges so obtained are largely independent of the positioning of the surface points, provided they are outside the Van der Waals radius and that there are enough of them. The semi-empirical wavefunction used in the MNDO-like methods is not suitable for directly calculating the ESP, but by the technique of deorthogonalising the orbitals, ESP's close to those of ab initio

calculations can be obtained [74].

Clearly this method is of no use if the reason for using point charges is to reduce the time required for calculation of the MEP. In these circumstances, Mulliken Population Analysis [75] gives a very rapid means of calculation atomic charges. There is no unique definition of how many electrons are associated with each atom in a molecule, but the total number of electrons can be obtained by integrating the total charge density, (eqn. 2.30). i.e.

N = 2 >

Ix)/ir)l

(2.50)

which separates the total number of electrons into groups of two per molecular orbital. By substituting in the LCAO basis, the total electrons can be expressed in terms of the Density and Overlap Matrices:

In document Análisis de la Interpretabilidad de la Salud Ósea con Árboles de Decisión Difusa (página 36-43)