As mentioned above, itinerary disruptions account for almost 50% of passenger delays, yet only 3.3% of passengers have their itineraries disrupted. The reason disruptions pose such a significant cost is the delay associated with re-accommodation, which averages almost 7 and a half hours per disrupted passenger. There are also significant and systematic variations in the delays to disrupted passengers. For example, disruption delays are much lower for morning travelers, as indicated in Table 3-8. To better understand this variability, in this section we develop a simple linear regression model to predict the delay associated with an itinerary disruption. We subsequently use this model in Chapter 4 to estimate delay cost coefficients associated with itinerary disruptions.
67 In our regression model, the dependent variable is the minutes of delay associated with an itinerary disruption. The observations we use to train the model are the disrupted passengers and their estimated delays based on the passenger delay calculator described in Section 3.3.1. To better represent the full range of variability, we estimate the disruption delays based on re-accommodation delay limits of 24 hours for both daytime and evening passengers. Additionally, we filter out disrupted itineraries where we are unable to re-accommodate the corresponding passengers based on the 24 hour delay limit. Thus, out of the approximately 16.0 million disrupted passengers, we exclude the 8.0% who receive the default delay resulting in 14.7 million observations (see Table 3-1 for details). Often, multiple passengers travel on the same disrupted itinerary and re-accommodation alternative, thus we can reduce the computational complexity of training the model by grouping these passengers and weighting the corresponding observation. By doing so, we are left with 5.3 million distinct observations.
As inputs into the model, we utilize the raw features:
• -³´µ = estimated hour of disruption for observation , either the planned departure hour of the
first flight cancellation or the planned departure hour of the second flight in the case of a missed connection;
• -EH¢¶ = average number of daily non-stop alternatives for the month from the point of disruption to the intended destination for observation using one of the planned itinerary carriers or related sub-contracted carriers;
• -EH¡ « = average load factor for the month on non-stop flights from the point of disruption to the intended destination for observation using one of the planned itinerary carriers or related sub-contracted carriers;
• -IH¢¶ = average number of daily one-stop alternatives for the month from the point of disruption to the intended destination for observation using one or more of the planned itinerary carriers or related sub-contracted carriers;
• - ¡ = 1 if the disruption for observation is due to a flight cancellation; and
• -¢¶¢ = number of planned stops remaining at the time of disruption for observation (e.g., either 0 or 1).
Based on these features, we estimate the disruption delay +-(-) using the regression function represented in equation (3-3), where ℐ(∙) represents the indicator function for the expression argument.
68 +-(-) = ªE+ ª ¡- ¡+ U ª·³´µℐ(-³´µ= ℎ) ¨¨ ·Z¥ + ª¨¨³´µℐ(-³´µ= 23) +ªºµ»³R5 −
-³´µSℐ(-³´µ< 5) + ªEH¢¶-0-stopR1 − -EH¡ «S
+ªIH¢¶
-IH¢¶+ ª¢¶¢-¢¶¢+ ª¢¶¢(I)-IH¢¶-¢¶¢
(3-3)
For additional context, we provide a brief description of each of the parameters utilized in the regression function:
• ªE – the baseline delay for a disruption during the 5:00am hour;
• ª ¡ – the impact of flight cancellations on disruption delays (as compared to missed connections);
• ª·³´µ – disruption delay associated with each possible hour of disruption between 6:00am and 11:59pm. Disruptions between 10:00pm and 11:59pm are grouped together due to a limited amount of data;
• ªºµ»³ – disruption delay factor for each hour between the hour of disruption and 5:00am for pre-dawn disruptions (i.e., midnight through 4:59am);
• ªEH¢¶ – change in disruption delay based on the daily frequency of empty non-stop alternatives (i.e., the number of full planes worth of seats available on a daily basis);
• ªIH¢¶ – change in disruption delay based on the daily frequency of one-stop alternatives (ignoring load factors);
• ª¢¶¢ – impact of disruptions that occur prior to the first flight in a one-stop itinerary (i.e., the impact of the first flight in a one-stop itinerary being canceled); and
• ª¢¶¢(I) – additional change in disruption delays based on the daily frequency of one-stop alternatives, when a one-stop itinerary is disrupted prior the first flight.
In Table 3-4, we list the estimated regression function parameter values, along with the standard errors and t-values. Each of the parameters is significantly different from 0 at the 99.9% confidence level (under a classical t-test, the probability of exceeding the magnitude of the t-value never exceeds 10-15). The overall model has an adjusted R2 value of 0.2752.
69
Parameter Estimate Std Error t-value
ªE 364.80 2.67 136.79 ª ¡ 184.90 0.35 526.23 ª¥³´µ 24.37 2.74 8.89 ª¯³´µ 48.09 2.75 17.47 ª¿³´µ 73.55 2.74 26.86 ªÀ³´µ 90.56 2.72 33.27 ªIE³´µ 126.40 2.73 46.29 ªII³´µ 147.10 2.72 54.09 ªI¨³´µ 189.30 2.72 69.52 ªI©³´µ 277.10 2.71 102.19 ªI²³´µ 300.60 2.71 110.79 ªI°³´µ 329.20 2.71 121.49 ªI¥³´µ 360.20 2.70 133.26 ªI¯³´µ 404.00 2.70 149.72 ªI¿³´µ 463.00 2.71 171.13 ªIÀ³´µ 500.30 2.70 185.06 ª¨E³´µ 515.70 2.72 189.69 ª¨I³´µ 536.10 2.73 196.36 ª¨¨³´µ 464.90 2.81 165.77 ªºµ»³ 58.95 1.23 48.05 ªEH¢¶ -115.50 0.13 -884.33 ªIH¢¶ -1.09 0.01 -114.28 ª¢¶¢ -73.14 0.69 -106.21 ª¢¶¢(I) -6.57 0.05 -141.38
Table 3-4: Estimated disruption delay regression function parameters with standard errors and t-values
Based on the parameter estimates, we find that as the hour of disruption becomes later, the average disruption delay increases until it hits a maximum during the 9:00pm hour. Beyond 9:00pm, the average disruption delay decreases to the minimum reached during the 5:00am hour. The fact that ªºµ»³ is close to 60 is not surprising because there are rarely re-accommodation alternatives available between midnight and 5:00am. The availability of non-stop alternatives (both flights and seats) is the most beneficial factor for reducing passenger delays. The availability of one-stop alternatives is also beneficial, especially when the disruption occurs prior to the first flight in a planned one-stop itinerary. The estimates for ª ¡ and ª¢¶¢ indicate that delays associated with disruptions are lowest for missed connections, followed by first flight cancellations in a one-stop itinerary, and highest for last flight
70 cancellations (either for a non-stop itinerary or for the second flight in a one-stop itinerary). This ordering is consistent with the number of disrupted passengers competing for seats on re-accommodation alternatives. That is, typically with a missed connection, only a few passengers are disrupted, making it easier to find seats to re-accommodate them. Alternatively, when the first flight in a one-stop itinerary is canceled, the disrupted passengers can often be re-accommodated through a different connecting airport, avoiding competition for seats with the non-stop passengers. A cancellation for the last flight in a passenger’s itinerary leads to the highest disruption delays, because there are typically many passengers competing for the same seats on non-stop re-accommodation alternatives.