La costa atlántica, un caso emblemático de victimización
DECLARACIONES DE PARAMILITAR JESUS CHARRIS INVOLUCRA A DIRECTIVOS DE DRUMMOND LTD
estimator. B
Lee & Campbell (1985) proved that there exists an optimal value k for the ordinary ridge estimator but that this optimal k is not unique. Furthermore, there is no algebraically closed form expression for k available in general; see also Nordberg (1982). Consequently, they devised an iterative algorithm to estimate k empirically. However, the convergence of their iterative procedure has not been established. In the GLM setting, the optimal k seems impossible to derive due to the presence of the extra order terms in the MSE[ß (k)] expression. This difficulty is apparent when one attempts to solve for k by setting (3.15) to zero. Hence, in practice, the optimal ridge estimator for GLMs can only be estimated approximately via some
'adaptive* rules of selecting k.
Our proposed method of finding a suitable k is derived from Empirical Bayes type arguments similar to those used by Lawless & Wang (1976). The resulting adaptive ridge estimators are promising because their ordinary ridge versions "have performed well in previous simulations and do not require iterative solutions" (Hoerl, Schuenemeyer & Hoerl, 1986). The arguments are as follows.
Consider the Bayesian interpretation of the general ridge —.T* rP > ^ __ /V A __
estimator (Section 3.4). Let P (X WX)P = E = diag{ei> where P is an orthogonal matrix and e^<...<e^ denote the ordered eigenvalues of XTWX. Let a
=
PTß be the MLE of a = PTß. For large n, we have a|a ~~-l
N(a, E }. An immediate consequence of Theorem 3.3 is that if one adopts a N{0, k ^1} prior distribution for a, then the (asymptotic) Bayes estimator for a is (E+kl) Ea. Assuming a N{0, k }, l<j<p, a simple and intuitive estimate of the prior variance, k ^, would be
A 2 A ^ A
2a^./p = ß ß/p. The resulting value for k is hereafter referred to as k = p / ß lß .
CL
-1
On the other hand, since a.\a. ~ N{a., e. } asymptotically, taking
J J 3 «3
expection of gives:
~2n , „r^21 , r 2 ~-ln .-1 ~-l
£[a.J = 6 6[ct. a.] ^ S [a.+e. 1 = k + e. . 3 J L 3 JJ a L j j J j
/V A<p A ^
Unconditionally, 2[e.a.] ^ e M + 1 and 3 3 3
P /V Art __-I P A
£[ 2 e.a.] ^ k 2 e. + p .
j=i J J j=i J A A ^ 2
This yields k ^ {2e .}/{5[2e .a .]-p}. An adaptive choice for k would be <3 J J
rp a ^ p
c = 2 e. / 2 e.a. = tr{X WX} {ß X WX/3)
.1=1 J .i=l J J
It is noted that p is not subtracted from the denominator since the prior variance k * is assumed to be reasonably large. In doing this, we arrive at a conservative (smaller) estimate of k, so that the
A ^ 'p/N. __-I *p^V A
resulting adaptive ridge estimator, ß (k^) = (X WX+k^I) X WXß, is not A
too far from ß. Finally, we should bear in mind that the above derivations are based on heuristic agruments only; the optimal choice of k is still unresolved at this stage.
4.2. MONTE CARLO STUDY 4.2.1 Purpose and Scope
The theoretical results of the ridge existence theorems are justified conditional on a nonstochastic k. But in reality, k has to be estimated empirically according to adaptive rules such as those proposed in the last section. It may then be argued (eg. Draper & Van Nostrand, 1979) that such an adaptive ridge estimator would no longer
guarantee a reduction in MSE as k is now stochastic. The main difficulty with adaptive ridge estimators is the mathematical intractability of their sampling properties. Comparisons among these estimators (in terms of MSE say) must invariably rely on simulation results. The purpose of our Monte Carlo study is therefore to determine the properties of the proposed adaptive ridge estimators under a variety of experimental conditions. Due to the cost and time involved in running a comprehensive simulation for the entire class of GL M s , we have decided to limit our scope to a detailed investigation for the special case of binary logistic regression. Hopefully, some general pattern of performance of the estimators will emerge from our study when considered together with the corresponding findings in the ordinary ridge regression literature.
The model used in the simulation is:
P{yi=l} = f-^Cß) = (1 + exp[-xTß]} 1 , i=l--- ,n, (4.1) with y ^ .... y^ being independent binary (0,1) responses at x ^ , ...,x^
respectively. This logistic regression model is widely used in the analysis of dichotomous response data (Cox, 1970). A convenient measure of performance for any estimator
ß
is MSE[ß] =S(ß-ß)^(ß-ß
) .A
However, MSE[ß] is infinite for the above model (4.1): there are exactly 2n possible samples, and hence 2n possible values for
ß;
thus,A
P{/3^ = oo for some j} > P{y^ =...= y^= 0} > 0, (see Silvapulle, 1981). If the expectation in the MSE expression is taken conditional on "the MLE is finite", then the M S E ’s of
ß
andß
(k) are finite. We believe A that such a conditional MSE is a reasonable criterion for comparingß
with other estimators ofß.
Hereafter, all subsequent discussion inA
this section should be interpreted as conditional on
ß
being finite. In addition, the adaptive ridge estimatorß
(k) without a subscript for k will refer to either k^ or k^.4.2.2 Design Aspects
A careful simulation design is important since the interpretation of results and the conclusions which follow depend to a large extent on the particular setting of that experiment. The design used here is determined in part by findings of simulations investigating alternative estimators in ordinary linear regression. We suspect that
the MSE of
ß
(k) could depend on n, p, the degree of col linearity in X, and the direction and length ofß.
To evaluate the effects of the direction ofß
on the MSE, two separate designs are considered. In the first design,ß
is generated as a Uniform random point on the surface of a sphere with radius r, for each replication. In the second design, two fixed directions ofß
are investigated for each X. Thus, the M S E ’s estimated from the former provide an average over all possible directions, while those corresponding to the latter are specific to the two fixed directions ofß.
(The two directions are chosen so that one direction is likely to favour ridge type estimators, and the other is likely to be unfavourable). Details of these two designs are given below.Design 1 (i) For p = 3:
(a) Generate the original explanatory variables
3C
= as x. = (l-a^)^z. + az, , j = 1,2,3,J J 4 J
where z ^ , z^,
T
0 < a < 1. (b) Standardize the SC matrix to X so that X X is in correlation form. (c) Generate N = 500 replicates of
ß
each as a random point on the surface of a sphere with radius r, the distribution being Uniform (see Section 4.1 in Lawless & Wang (1976)).T