• No se han encontrado resultados

3.2 Model definition

3.2.3 The transmissibility surface transmission model

The basic transmission model Eq 3.27 accounts for short-range predictors of transmission, and treats long-distance jumps as a stochastic process with constant rate across all locations.

However, it overlooks mid-scale variation in transmission strength. To adjust for this variation, one can allow the transmissibilityβdto depend on time and geographic location. However, fitting an independent value for the transmissibility at each time and location would require fitting far more parameters than there are available data points and, even if it were possible, would yield an over-fit model from which underlying patterns would be difficult to identify.

These issues may be avoided by imposing a correlation structure on the transmissibility values. In particular, a time-specific adjustmentξtT and a space-specific adjustmentξiSare incorporated into the model’s transmissibility term, yielding a new model of form

λi(t) =β0dExp[ξtTiS]Niµj∈Λtnθj,tκ(di,j)

j̸=iκ(di,j) (3.45) where

ξξξT = fT(ttt) +εεεT, ξξξS= fS(xxx) +εεεS and

fT ∼GP(µT(·),kT(·,·)), fS∼GP(µS(·),kS(·,·)).

That is, the temporal and spatial adjustments to transmissibility,ξξξT = (ξ1T, . . . ,ξtmaxT )and ξξξS= (ξ1S, . . . ,ξnS), are described by the Gaussian processes fT and fS. Here,tttis an array of the time intervals during which the epidemic occurs, andxxxis an array of the ZIP coordinates.

The vectorsεεεT andεεεSare additive noise terms, where each elementεtT andεiS follows an independent, identically distributed Normal distribution with mean 0 and varianceσT2 and σS2, respectively; that is,

εtT ∼N(0,σT2) and (3.46)

εiS∼N(0,σS2) (3.47)

for allt and i. The Gaussian processes fT and fSare defined by the temporal and spatial mean functions µT andµSand the temporal and spatial covariance functionskT andkS. The transmissibility termsξtT andξiSenter the transmission model via an exponential function to ensure that the full transmissibility term remains positive.

Specifying the model in this way allows the modeller to impose a correlation relationship between the temporal and spatial transmissibility values respectively, preventing the problems

associated with overfitting. The new terms may be interpreted as a “global” transmissibility adjustmentξξξT that varies over time equally across all locations, plus a “local” transmissibility adjustmentξξξSthat accounts for additional spatial variation. FixingµT(t) =µS(x)≡0 and kT(t,t) =kS(x,x)≡1 for allt,x,t,xyields the original best transmission model, Eq 3.37.

In this case, the adjustmentsξξξT andξξξSare identically equal to zero.

Other choices of covariance function allow for more flexible transmissibility surfaces.

The task of choosing a prior form for the covariance function can be simplified by considering what sort of predictor the Gaussian process might be replacing. Broad-scale weather changes and geographic and temporal differences in human behaviour might all explain the additional variation in transmissibility. It is reasonable to assume that these predictors might vary somewhat smoothly in space and time. A squared-exponential kernel, then, for its smoothness and analytical simplicity, might be a natural choice. One might seek a temporal process with length scale of approximately one month (eight half-weeks), since a shorter length scale would risk reintroducing the problem of overfitting, while a larger one might not be flexible enough to detect important structure. For the spatial process, a length scale of approximately 3ρis reasonable. At this distance, the distance kernelκ has dropped by 95%, so the Gaussian process would capture mid-scale variability, picking up where the distance kernel leaves off.

According to the MLE parameter values from Table 3.4, a distance of 3ρ is approximately 200 km.

For the following model fits, ξξξT is assumed to follow a Gaussian process prior with mean functionµT =0 and SE covariance function kT with length scalel=8 half-weeks.

The spatial transmissibility adjustment ξξξS is assumed to follow a Gaussian process prior with mean functionµS=0 and squared-exponential covariance functionkSwith length scale l=200 km. Spatial length scales ofl=100 km andl=500 km were also tested, as well as a rational quadratic covariance function with length scalel=200 km.

Parameter estimation

Posterior distributions forξξξT andξξξSare estimated using a Metropolis Hastings algorithm.

In each iteration of the algorithm, proposed values for a random subset ofξξξT are drawn from a multivariate normal distribution specified by Eq 3.20-3.22, wherexxxare the proposal time points and thexxxare the other time points, with valuesyyy. The model’s likelihood is evaluated with these new proposed values. The proposal is accepted with probability proportional to the ratio of the new likelihood to the most recently accepted likelihood (or, for the first iteration, the original model’s likelihood, with noξξξT orξξξS). This is repeated for mutually exclusive random subsets ofξξξT until a new value has been proposed and either accepted or rejected

for each element of ξξξT. The same procedure is applied to update ξξξS. This constitutes a single iteration of the algorithm. The algorithm is run four times for 10,000 iterations each.

Following Gelmanet al.(2013) [86], the first 5,000 iterations of each run are discarded to avoid effects from the burn-in period. To assess convergence, the Gelman-Rubin statistic is calculated for each location and each half-week [86]. The Gelman-Rubin statistic is a ratio of the between-chain to the within-chain variance that approaches 1 from above as the number of iterations approaches infinity. The above procedure yields a Gelman-Rubin statistic of below 2 for all chains, and below 1.1 for all ξξξS chains and 37 of the 40ξξξT chains. This suggests that the chains have converged. The ordered Gelman-Rubin statistics for theξξξSand ξξξT chains are depicted in Fig 3.9.

0 200 400 600 800

1.0 1.2 1.4 1.6 1.8 2.0

OrderedξiS

Gelman-Rubinstatistic

0 10 20 30 40

1.0 1.2 1.4 1.6 1.8 2.0

OrderedξiT

Gelman-Rubinstatistic

Fig. 3.9 Gelman-Rubin statistics for the spatial (ξξξS) and temporal (ξξξT) transmissibility surface Markov chains, produced using the Metropolis Hastings algorithm described in

§3.2.3. The Gelman-Rubin statistic for all spatial and temporal chains is below 2, and is below 1.1 for all spatial chains and all but three of the temporal chains. This provides good evidence that the chains have converged.

The last 5,000 iterations for each of the four runs are combined, yielding 20,000 posterior estimate draws ofξξξT andξξξS. Fig 3.10 depicts the mean exponentiatedξξξT values with±2 standard deviations. The exponentiatedξξξT may be interpreted as a temporally-varying multi- plicative factor for the transmissibility termβd. There is evidence of a drop in transmissibility in August that rises again from September to mid-October, before dropping again to average by the end of the outbreak. The curve is jagged, reflecting the breakpoint onset detection method’s tendency to place epidemic onset times preferentially on half weeks (see §2.3.2);

there are local peaks in ξξξT on half weeks and dips on whole weeks. Despite this small- scale bias, there is still a clear trend in the large-scale structure. Fig 3.11 depicts the mean exponentiated ξξξS values geographically, which may be interpreted as a spatially-varying multiplicative factor for βd. There is evidence of higher-than-average transmissibility in

the southeast and lower-than-average transmissibility in the mid-Atlantic region where the epidemic wave slowed.

● ●● ● ●

● ●

● ●

● ●

● ● ● ●

● ● ● ● ●

Aug Sep Oct Nov Dec

0.5 1 2 4

Exp(ξt)

Fig. 3.10 Mean exponentiated temporal transmissibility values, Exp[ξξξTTT] (black line), with

±2 standard deviations (grey band). Values above 1 indicate higher-than-average transmissi- bility, and values lower than 1 indicate lower-than-average transmissibility. The temporal transmissibility adjustment begins slightly above 1, before dipping in mid-August through the beginning of September. It then rises until mid-October, and finally decreases again back to 1 at the end of the epidemic. The locally jagged pattern is an artefact of the breakpoint onset detection method’s preference to place epidemic onsets near half-week values (see

§2.3.2); this makes transmission appear to be stronger on half weeks vs. full weeks.

For comparison, the transmissibility adjustmentsξξξT andξξξSare re-estimated using (1) a SE covariance function with spatial length scale of 100 km and temporal length scale of 8 half-weeks, (2) a SE covariance function with spatial length scale of 500 km and temporal length scale of 8 half weeks, and (3) a RQ covariance function with spatial length scale of 200 km and temporal length scale of 8 half weeks. The temporal length scales in all scenarios is kept at 8 half-weeks since a shorter length scale risks over-fitting, while a longer length scale would make the process too inflexible to show significant variation over the 14-week outbreak. The posterior estimates of Exp[ξξξT]under each scenario are depicted in Fig 3.12, and the posterior estimates of Exp[ξξξT]under each scenario are depicted in Fig 3.13. The overall shape of Exp[ξξξT]under all three scenarios is similar to the shape obtained using a SE covariance function with spatial length scale of 200 km and temporal length scale of 8 half weeks (Fig 3.10). All have a dip in transmissibility in September and a rise in transmissibility in late October. This is perhaps unsurprising, since the temporal length scale for all three scenarios is the same, but it does show that inference of the temporal process is fairly robust to changes in the spatial length scale and the covariance function. The

Exp[ξS]

0.25 0.5 1 2 4

Fig. 3.11 Map of the mean exponentiated geographic transmissibility values, Exp[ξξξSSS]. Discs represent ZIPs, and are coloured according to the ZIP’s estimated mean Exp[ξiS]value. Values higher than 1 (red) indicate higher-than-average transmissibility, and values lower than 1 (blue) indicate lower-than-average transmissibility. The transmissibility adjustment is highest in the southeast, while there is a clear band of low transmissibility in Missouri, Illinois, and Kentucky, where the epidemic wave slowed. There is also evidence of higher-than-average transmissibility in the central valley of California, where a second epidemic wave appears to have been sparked.

spatial transmissibility surfaces depicted in Fig 3.13 also reveal roughly similar patterns as the transmissibility surface in Fig 3.11, with elevated transmissibility in the southeastern US.

Under the first scenario, with spatial length scale of 100 km, the transmissibility surface is more locally variable than the surfaces produced using longer spatial length scales. This is to be expected, since a length scale of 100 km yields a very flexible transmissibility surface.

The transmissibility surface produced under the second scenario, with spatial length scale of 500 km, is less locally variable than the transmissibility surfaces produced using shorter length scales. The overall range of transmissibility values is also smaller, with Exp[ξξξT] ranging from 0.3 to 3.5, rather than from about 0.2 to over 6 for the SE scenarios with shorter length scale. This reduced variability is due to the surface’s higher rigidity. The transmissibility surface produced under the third scenario, with spatial length scale of 200 km and a RQ covariance function, is similar to the transmissibility surface produced using a SE covariance function and the same spatial length scale (Fig 3.11). This suggests that the estimated transmissibility surface is somewhat robust to the choice of covariance function, though it should be noted that both covariance functions belong to the same general class, yielding a surface with infinite mean-square differentiability everywhere (see §3.1.3).

We proceed using the mean posterior estimates for ξξξT and ξξξS obtained using a SE covariance function with spatial length scale of 200 km and temporal length scale of 8 half weeks. Fig 3.14 depicts the difference between the actual and expected outbreak onset times by location using the new model, Eq 3.45, with these posterior mean estimates substituted in.

Comparing with Fig 3.7 indicates that includingξξξT andξξξSresolves many of the systematic discrepancies. Fig 3.15 depicts the expected and true cumulative number of locations infected over time under the new model. The shapes of the curves now match closely. The gap between the curves points to a remaining model mis-specification. In general, there are always a few more true outbreaks than expected, likely due to mid-range jumps of infection that the model cannot reliably predict. There appear to be about 50 such jumps at any given time; adding 50 to the expected cumulative onset curve makes the two curves match almost perfectly, except at the very beginning and very end of the epidemic. The discrepancy may be due to a mis-specification in the shape of the transmission kernel, and suggests that exploring more flexible kernel forms may be warranted.