• No se han encontrado resultados

5.3. Elementos que componen el Sistema Regional de Innovación de la

5.3.8. Las Universidades

Since this dataset was obtained by observation beginning at a fixed arbitrary time, the value νi,t is unknown for all t < ti,(1). This left-censoring should be

uninformative, so we set Yi(t) = 0 for all such t and begin the analysis at the

first time where P

iYi(t) > 1.

8.4

Fitting

The “partial posterior probability” in ψ is the product of the marginalized Gaussian process prior, π, and the Cox partial likelihood L.

For a first pass at estimation, we consider “log-partial posterior probabil- ity” in ϕ, log L = l + log π = P tϕ(t − τi,t)− logP j∈R(t)ψ(t − τj,t)− 1 2ϕ †K−1ϕ.

We apply the Newton-Raphson method to find the maximum partial a- posteriori estimate. The gradient with respect to ϕ is

∇(l + log π) =X t " eτi(t),t− P jψ(t − τj,t)eτj,t P jψ(t − τj,t) # − K−1ϕ.

Since these quantities involve mostly the total hazard at time t, the correct notation simplifies it greatly. Introducing the notation

st =P·ψ(t − τ·,t)

st;i =

P

·ψ(t − τ·,t)eτi

st;i,j =P·ψ(t − τ·,t)eτieτj,

this may be rewritten as

∇(l + log π) =X t  eτi(t),t− st;(j,t) st  − K−1ϕ. with Hessian ∇∇(l + log π) = − " X t stst;(i,t),(j,t)− st;(i,t)st;(j,t) s2 t # − K−1.

The Hessian is negative-definite (see below), so Newton-Raphson optimiza- tion is guaranteed to converge to the maximum a posteriori estimate. The step-size is dynamically adjusted and is stopped on a relative improvement of the partial posterior probability by less than 10−8.

8.5

Model Selection

A fully Bayesian analysis would introduce a hyperprior and sample the joint posterior in a2, b, and ϕ. Instead, we use model selection or empirical Bayes,

where the selection of the hyperparameters is performed using the fitted model above to evaluate a criterion in terms of the hyperparameters. With fixed a2

Conceptually, a criterion in ϕ, C(ϕ), is used to define a criterion in the hyperparameters, C2(a2, b), as

C2(a2, b) = C( ˆϕ(a2, b)),

which is maximized in terms of the hyperparameters. Note, however, that ϕ is an estimate and thus depends on the sample, D, as well. This will be important, so denote this explicitly as ϕD(a2, b). In addition, the criterion C

is estimated, typically on a subset of data not used in the estimation of φ, so likewise denote the criterion as

C2(a2, b, D, D0) = CD0( ˆϕD(a2, b)).

That is, the data D0 is used to validate the estimate ˆϕD(a2, b).

The procedure is fixed by selecting a function C and an algorithm to use the data to search over values of the hyperparameters. The utilization of the data in evaluating the function C will be important, so to this explicit, calling the subset of data used in evaluation D, denote the criterion function as C(ϕ, D). In this case, the function C(ϕ, D) the partial likelihood of ϕ in the selected data D. The parameters will be selected by defining a grid of values (a2

i, bj)

and, for each point in the grid, estimating the value C2(a2i, bj), at which point

the a2, b corresponding to the maximum cC

2 is selected.

8.5.1

Cross-Validation

An accepted method for dividing the data is to iteratively divide the data into a training set, D, and disjoint validation set D0.

Typically, the D and D0partition the full dataset, in which case the method is called cross-validation, with each sampled partition called a “fold.” The number of folds is typically limited by computational considerations; the spe-

cial case of N -fold cross-validation will exhaust the data in the sense of val- idating the model on every datapoint by iteratively assigning to D0 every singleton set. This special case is termed “Leave-One-Out Cross-Validation,” or LOOCV, and often considered ideal.

The application of this method to longitudinal data requires a specific definition of the division of the data. The units of partitioning will be taken as the individual units under observation, meaning that the units will not be further sub-divided by time. This is at least somewhat reasonable since the estimand is a function of the gaptime, which cannot easily be localized in time. Recall, however, that the partial likelihood requires at least N = 2 units to be informative. In light of this, the “ideal” is to conduct “Leave-Two-Out Cross-Validation,” L2OCV, by selecting for each fold each of the N2 possible pairs of feeders for the validation set D0.

The sampling of C2 is implemented by taking the mean of the partial

likelihood sampled over all N (N −1)2 folds. The (a2i, bj) achieving the greatest

mean is selected as the model.

Using all folds is tractable on our current dataset, however if necessary the hold-out pairs may be randomly sampled.

8.6

Baseline Estimation

Under this model, the cumulative hazard rate is identified as Λi(t) =

Z t

0

λ0(u)ψ(νi(u))Yi(u)du.

Using the first order equation

obtain λ0(u) ≈ I P Yi(u)>0 P Yi(u) X i dNi(u) ψ(νi(u)) ,

giving an immediate non-parametric point estimate for Λ0 as

[ Λ0(t) = Z t 0 IP Yi(u)>0 P Yi(u) X i dNi(u) ψ(νi(u)) .

Kernel methods can be applied to estimate λ0(t) from this estimate.

Note that this estimator adjusts for infant mortality in units and thus gives an estimate of ˆΛ0(t) as a pure exogenous effect. The effect of this is uncl

Specifically, in the presence of significant infant mortality, we would expect this ˆλ0(t) to be attenuated, compared to the direct Nelson-Aalen estimate.

8.7

Application

We apply this method to the Long Island, Queens subnetwork of the New York City electric grid system.

The failure gap times are reduced to percentiles, and the Gaussian process applied across the mean time of each bin. The figure following, which compares the smoothed and unsmoothed versions, illustrates the value of smoothing in this problem.

0e+00 1e+07 2e+07 3e+07 4e+07

0

5

10

20

unsmoothed estimate of \psi(t)

seconds since treatment

rate 0 1 2 3 4 5 0 5 10 20

unsmoothed estimate of \psi(t), 5 days = 4.3e5 seconds

days since treatment

rate

Figure 8.1: Unsmoothed Cox Estimator of Subsequent Effect of

Failure

0e+00 1e+07 2e+07 3e+07 4e+07

0 2 4 6 8 10

smoothed estimate of \psi(t)

seconds since treatment

rate 0 1 2 3 4 5 0 2 4 6 8 10

smoothed estimate of \psi(t), 5 days = 4.3e5 seconds

days since treatment

rate

Figure 8.2: Mean Subsequent Effect of Failure according to Radial Basis Function-smoothed Model with fitted

ˆ

Ornstein-Uhlenbeck Prior

Documento similar