4.1. Análisis de resultados
4.1.3. Prácticas de los estudiantes acerca del concepto de construcción de paz
In this section, the need for exact maximum likelihood in software is discussed. We have done many experiments with approximate likelihood methods, and found them to be very good in some ways and very bad in others. There are, of course, things to be desired in exact maximum likelihood, most notably speed of computations. There is also the problem of near-singular matrices at extreme points of a log-likelihood surface, which we will address next.
5.2.1
Near Singular Matrices
Beran [1994], page 108, notes that besides a large amount of computing time, exact maximum likelihood can be burdened by ill-conditioned matrices that are almost singular. The use of increasingly powerful (and precise) computers and the use of specialized algorithms mitigate these effects. As part of his example, Beran states that the correlation matrix for an
fgn
process with H = 0.9 forn = 100 (call this matrix D) has a determinant of approximately 5×10−39;also, the largest eigenvalue divided by the smallest eigenvalue was approximately 222. We
calculated the same results.
In Beran [1994], the ratio of eigenvalues was used to calculate the condition number of the matrix. Recall the condition number of a matrix is with respect to a norm: we use the`2-norm,
as was done implicitly in Beran. Trefethen and Bau [1997] note that for this norm, the ratio of the largest and smallest singular values of a matrix give the condition number: in the case of Toeplitz matrices, these are the same ratios. It should also be noted that one generally looks at the condition number of the matrix relative to the size of the matrix: however, this point will be ignored. If the condition number of a matrixAisτ, one can expect to “lose” approximately log10(τ) significant digits by the inversion of the matrix - see, e.g. Cheney and Kincaid [2007]. For example, a perfectly well-conditioned matrix has τ = 1 and loses no digits. For most machines today, the approximate machine precision is ≈ 2.22 × 10−16. For the example
in Beran, τ = 222 and log10(τ) = 2.35. The matrix D was inverted with the ltsa function TrenchInverse to get E ≈ D−1. The maximum absolute difference of DE −I200 was about
2.11×10−15, and so it is noted that the rule of thumb is overestimating the number of digits
lost, up to the machine precision in subtracting the elements of the matrices.
As a test, the correlation matrix of an
fgn
process with H = 0.99 andn = 5000, call this F, was computed. The determinant reported byRwas 0, withτ = 245908.3 ⇒ log10(τ) ' 5.39. The inverse of said matrix was computed using theltsafunctionTrenchInverse; when this was multiplied by the original matrix, the maximum absolute value of the product minus the identity matrix was within 2.58×10−13. Notice thatF is much larger than Dand that it has avalue ofH much closer to the stationarity boundary, and yet loses only about three more digits in estimated precision and about two more in actual precision.
The covariance and correlation matrices for the
fd
case with n = 5000 and d = 0.49 were also calculated. The determinant of the covariance matrix was approximately 143. The de- terminant of the correlation matrix was presented as 0. Both matrices were inverted using the TrenchInversealgorithm of ltsa. With the correct scaling factor to account for the division of the theoretical autocovariance function by its first element, the two matrices had maximum difference reported as 0. However, this test is misleading, as the covariance and correlation ma- trices have the same condition number, with log10(τ)'5.13. This again overestimates the loss in significant digits, since when both matrices are multiplied by their inverses and subtracted from the identity, the maximum absolute error is about 2.6×10−13 for the correlation matrixand 1.46×10−13for the covariance matrix. The differences here are likely due to underflow. The inverse and determinant for the log-likelihood are computed relatively efficiently using the Durbin-Levinson or Trench algorithm, withstanding for the most part poorly-conditioned matrices. However, not all likelihood values are necessarily computable: for example as above with and a
fd
series was generated withdf set to 0.49, the log-likelihood was not computablefor either of the ltsa functions DLLoglikelihood and TrenchLoglikelihoodfor one test series with the generating df. This particular series will be called M. However, the effect of
this is sufficiently small. The exact algorithm had no problem finding the MLE for the data: we believe that for any given data set, the non-computable regions are very small. We have experimented with grids around this and a few other non-computable points, and found this to be true.
All of this addresses ill-conditioned matrices. However, while exact maximum likelihood may lose a few digits, it is as exact as it can be up to machine precision, while approximate max- imum likelihood is by definition not exact. It is our belief that it is better to lose a few digits of precision than not be exact. Thus the only advantage that approximate ML has over ex- act is speed. While advantage can be considerable, we will show that approximate maximum likelihood may be seriously flawed.