This chapter’s primary focus is on a dataset that captures weekly influenza-like illness (ILI) incidence for 12 age groups in 884 locations across the US between 2001 and 2009. The data are available from medical claims records maintained in the private sector. To my knowledge, they provide the finest geographic resolution of ILI data ever considered for studying the spatial spread of influenza in the United States. Viboudet al. (2014) [240], Goget al.(2014) [91], and Charuet al.(2017) [48] use the same ILI data aggregated to a coarser geographic resolution, with the number of geographic locations ranging between 200 and 400. The only other study of which I am aware that considers a comparable number of locations in the US is provided by Yanget al.(2015) [258], who estimate epidemiological parameters for 10 influenza seasons using ILI data from 115 US cities. A number of surprising epidemiological characteristics of the 2009 A/H1N1pdm influenza pandemic in the United States have already been pointed out in other studies, including the unusually early timing of the autumn wave [122], unusually high rates of morbidity and mortality among young adults [122, 198], and an unusually slow and coherent geographic wave of transmission that seems to have been seeded, unexpectedly, in the southeastern US [91]. The fine-scale data considered in this thesis provide an opportunity to study this unconventional outbreak in close detail.
A number of other studies characterise the spread of influenza at the country scale using spatially-resolved ILI data. In addition to the studies just mentioned by Goget al.(2014) [91], Charuet al.(2017) [48], and Yanget al.(2015) [258], Chowellet al.(2011a) [50] study the geographic transmission of influenza in Peru across 134 provinces using a combination of ILI and laboratory-confirmed data. Chowell et al.(2011b) [49] use a combination of ILI and mortality data to describe the geographic transmission of the 2009 A/H1N1pdm pandemic in Mexico. Smieszeket al.[218] use sentinel ILI data from Switzerland to model
0 250 500 750
Reg.1
0 200 400
Reg.2
0 1000 2000
Reg.3
0 500 1000
Reg.4
0 1000
Reg.5
2500 500750
Reg.6
0 200 400 600
Reg.7
0 1000 2000
Reg.8
0 500 1000
Reg.9
2002 2003 2004 2005 2006 2007 2008 2009 2010 0
250 500 750
Reg.10
A/H1N1pdm A/H1 A/H3 A/Unsubtyped B
Fig. 2.22 Weekly number of laboratory-confirmed cases of each antigenic subtype collected by the CDC between 2001 and 2010. The blue shaded region corresponds to the autumn wave of the 2009 A/H1N1pdm influenza pandemic, and the grey shaded areas correspond to the 2003-04 and 2007-08 seasonal outbreaks, to be discussed in Chapter 6. The autumn wave of the 2009 pandemic was dominated by the A/H1N1pdm strain.
the geographic transmission of the 2003-04 influenza outbreak in that country. Pagetet al.
(2007) [184] use sentinel ILI data and virological data aggregated to the country level to describe the transmission of eight seasonal influenza outbreaks across Europe. All of these find significant differences in outbreak timing by geographic location, motivating the study of the geographic transmission of influenza outbreaks using ILI data.
Electronic medical claims records offer a promising source of disease surveillance data, especially in the context of influenza [240, 260]. In particular, they appear to provide more finely-resolved information about the geography and age structure of influenza outbreaks than traditional surveillance methods can, while improving upon the accuracy of social media- and search query-based ILI estimates [182, 240]. However, the electronic medical record data stream still carries a number of limitations. As mentioned in §2.1.2, differing local incentives and coding practices can reduce the reliability of medical claims data [248].
Conflicting incentives can be problematic at the overall health-system level, too; in the United States, the ownership of health-related data in the private sector can drive up the cost of access, making electronic medical records, while easy to collect in theory, sometimes very difficult to obtain. In addition, privacy concerns rightly place a limit on the resolution with which medical claims data can be reported, so that some degree of aggregation will always be necessary. Not all aggregation strategies are the same, however, and more research is required to understand how to achieve aggregation with a proper balance between privacy, ease of coding, epidemiological relevance, and pertinence for intervention strategies.
Aside from the particular difficulties associated with electronic medical claims data, ILI itself is an imperfect measurement of influenza incidence. ILI incidence is normally reported as a proportion of physician visits due to influenza-like illnesses, which may not correspond to the per capita incidence of ILI [258]. This makes it difficult to estimate population-level influenza intensity from ILI data. Indeed, Viboudet al.(2014) [240] find that, while outbreak timing in the IMS-ILI data correlates highly with outbreak timing in the CDC’s ILI and virologic data, correlations in outbreak intensity between the datasets, measured as total additional ILI after subtracting out a sinusoidal baseline, are significantly lower. This provides a rationale for focusing on outbreak timing when considering the IMS-ILI data, and motivates this chapter’s focus on developing a robust algorithm to detect epidemic onset times from ILI incidence time series.
The breakpoint method, originally introduced by Charuet al.(2017) [48] and presented in updated form here, appears to offer a robust means of identifying outbreak onset times from noisy, potentially autocorrelated time series of ILI incidence. A major advantage of the breakpoint method is that it avoids a need to define baseline and threshold levels of ILI
activity, which generally must be done in an ad hocway [248]. The adjustments to the breakpoint method introduced in this chapter include (1) fitting the breakpoint regression to a fixed number of time series points for all locations, to ensure that onset uncertainties are comparable between locations, (2) introducing a strategy to measure onset uncertainty using the likelihood profile of the breakpoint estimate, (3) using that onset uncertainty estimate as a criterion for accepting or rejecting a time series from analysis, rather than simply rejecting the 20% of locations with the smallest differences between maximum and minimum ILI intensity, and (4) introducing a strategy for identifying accurate onset times from time series with multiple incidence peaks. This chapter also presents the first systematic evaluation of the breakpoint method’s performance. According to this analysis, the breakpoint method performs best when the ILI time series has a clear, sudden rise in incidence at the outbreak onset time, though its performance is still good in noisier settings. Most ZIPs in the US have a sharp increase in ILI activity at the beginning of the autumn wave of the 2009 A/H1N1pdm pandemic, so the breakpoint method should give fairly accurate onset estimates for that epidemic. The breakpoint method can also generally detect epidemic onset times with higher precision and accuracy than an optimised threshold method, especially when autocorrelation between subsequent incidence values is high. Interpolating the breakpoint onset time estimates to the nearest half-week introduces a bias, with relatively more onsets estimated to occur on half weeks than on whole weeks. However, the overall trajectory of the autumn wave of the 2009 A/H1N1pdm influenza pandemic that becomes visible when mapping the breakpoint onsets (Fig 2.8) matches well with the patterns observed by Goget al. (2014) [91] and Charuet al.(2017) [48], the only other studies of which I am aware that provide detailed pictures of the geographic transmission of the autumn 2009 pandemic wave in the US.
The breakpoint method was not tested for its ability to detect outbreak onset times in real time for an ongoing outbreak. Since the breakpoint method relies on the full epidemic time series prior to and including the peak, it is possible that when presented with less data, such as before the epidemic has peaked, a threshold-based method might perform equally well or better. The breakpoint method might be prone to detecting frequent spurious onsets due to stochastic rises in ILI that yield ‘false’ peaks. Also, to use the breakpoint method in real time, one would have to decide upon an acceptable width for the onset likelihood profile, below which an onset would be identified. This seems to simply push the task of defining a threshold to a higher level of abstraction, which may not ultimately be helpful. It is therefore unlikely that the breakpoint method will contribute to real-time outbreak onset detection, but further work might still be warranted in this area.
Statistical analysis of the breakpoint outbreak onset times offers insight into the ge- ographic transmission of the autumn 2009 A/H1N1pdm influenza pandemic wave in the US. In general, and in agreement with findings from Goget al. (2014) [91] and Charuet al.(2017) [48], the outbreak featured a major geographic transmission wave that spread from the southeastern US. Wave-like geographic transmission patterns for influenza at the continent and country scales have been reported in a few of other studies as well: Pagetet al.(2007) [184], for example, find evidence of west-to-east and south-to-north spread of influenza across Europe in four seasons between 1999 and 2007, and Smieszeket al.(2011) [218] report a north-easterly spread of influenza across Switzerland in 2003. As may be expected by the relatively high incidence of influenza in children [81, 179, 245], outbreak onset times from the autumn of 2009 in the US are generally first detectable in school-aged children, with 10-19 year-olds leading the estimated outbreak onset times in more ZIPs than any other age group. At the full US scale, it appears that the overall geographic transmission pattern of the autumn 2009 pandemic wave was fairly consistent between age groups, with all age groups showing evidence of a transmission wave spreading from the southeastern US. The geography and timing of the start of the pandemic wave may be associated with the relatively early start of the school term in the southeastern US. This would agree with a number of previous studies that have identified schools as key sites of transmission during influenza outbreaks [117, 199, 255]. However, the opening of schools cannot fully explain the onward spread of the autumn 2009 pandemic wave, since the transmission wave lagged well behind the start of the school term in many ZIPs, especially in the northern US. The themes uncovered by these statistical analyses constitute major areas of focus for the rest of this thesis: Chapter 3 examines potential contributors to the geographic transmission of the autumn 2009 A/H1N1pdm outbreak in the US, including the role of schools; Chapter 4 considers the geographic establishment sites of the outbreak; and Chapter 5 takes a closer look at how different age groups may have contributed to both sparking and sustaining transmission.