LPMS data was analyzed at the level of individual events. Data on queuing and lockage times for each lock and dam facility were collected. Once the data was assembled, descriptive statistics were run to initially characterize the system. This yielded histograms that were used to select a probability density function (PDF) that best matched the empirical data. The PDF’s used in the model varied depending on the shape of individual histograms. Based on this analysis, the research team concluded that three PDF’s were useful – normal, gamma, and exponential distributions. Figure “X” gives one example of a PDF that was derived from information about the Newburgh lock and dam facility. A normal distribution offered the best fit in this case.
64 Figure X: Example of a Fitted Probability Density Function After a PDF has been selected, the DES can use that distribution to randomly select the duration of an event – in this case a lockage. This process was repeated for all defined events within the system. As shown in Figure Y, separate PDF’s were calculated for each lock and the system of event connected through model logic. Facility characteristics vary, however, as each has at least one lock for upstream‐ and downstream‐moving vessels. The DES model was run for a number of simulation times, ranging from 320 to 400 days of activity, for an average of 364.7 days (or one year).
65 Figure Y: Discrete Event Simulation Model, Illustrating Logic
Outcomes
After model validation, EZStrobe software offers the ability to complete preliminary visualizations and generate comprehensive statistical reports. Visualization can occur through a number of methods. The most common, however, is simply highlighting the model components to be visually depicted (as illustrated in Figure Y) or by taking advantage of the live reporting parameter (see Figure Z). The visualization of the live reporting parameter allows users to watch the simulation and note where large queues develop and infer other salient information about system operations. These can later be reported from the system as seen in Figure AA (next page).
66 Figure Z: Live Simulation from EZStrobe Figure AA: EZStrobe Report
67
Conclusion
DES modeling carries strengths and limitations. First, DES permitted a thorough exploratory analysis of the LPMS data. The final model highlighted potential areas of concern in the system where delays are the most likely to occur. Initial runs also revealed there were problems and inconsistences in the LPMS data that demanded correction before a statistical model could be produced. DES illustrated Ohio River system operations using an aggregated approach by obtaining averages for different events, such as the length of delays vessels are exposed to while waiting in lockage queues. The simulation gave researchers the power to manipulate some variables, or change PDF’s to determine how they would impact system operations.
Arguably, the deficiencies of DES modeling outweigh the advantages. For example, the LPMS data shows a clear seasonal trend in system operations (recall, traffic density increases during warmer months). But there are other factors that influence how quickly a vessel can move along the river. Placing additional variables into the DES would have resulted in an untenably complex model, one that is slow to estimate the duration of events. As noted, a key objective of the research was to develop an application that could be transferred to stakeholders.
Because EZStrobe is a proprietary modeling environment, it did not lend itself to widespread dissemination. Developing an original statistical model from scratch was therefore the most pragmatic option because it would be able to account for all of the critical variables (e.g. water level, season) that affect patterns of movement. While DES is relatively flexible, a statistical model gives users greater flexibility to independently adjust model factors. While the DES offered an ideal platform to conduct exploratory data analysis by clarifying aggregate system operations and providing the means to visualize traffic bottlenecks, the research team deemed the platform insufficient for developing a final product.
The Predictive Model: Modeling Delays on the Ohio River
Introduction The locks along the Ohio River serve as bottlenecks that slow movement of cargo. Hindrances along a supply chain can also interfere with cargo delivery, but delays at a transport bottleneck carry special importance. This is because the impact typically spreads beyond individual vessels to others that queue up. For that reason, the attractiveness of waterborne transport hinges not only on considerations unique to one cargo or business model, but also on complex features of the river system that can determine the speed and reliability of movement. This includes the structural reliability of locks and the nature of vessels plying their trade along the routes.
68
From a carrier’s perspective, the delay encountered at a lock stretches from when a vessel first arrives until it exits the lock. Total delay time results from two theoretically distinct processes: the actual duration of the lockage, and the amount of time spent in queue waiting for permission to begin a lockage. Sources of delay that influence one process may have little bearing on the next process, and vice versa. The models below therefore use two dependent variables: the delay between arrival and start of lockage (i.e., queuing time), and the amount of time that elapses between start and completion (i.e., lockage time). As described previously, we benefited from access to LPMS delay data starting in January 2002 and ending on September 30th, 2012. This data represented almost 600 thousand lockages and therefore provided more than one million data points when counting both types of delay. The aim of this chapter is to provide a comprehensive narrative for the multiple statistical models that were developed during the course of this project to explain (and predict) system operations. Using a narrative approach also provides readers with a better understanding of how we coped with (and responded to) challenges that arose during the modeling process. After presenting, discussing, and justifying the ways in which we parameterized our different models, we provided the reader with an overview of the results. While our models have been successful in accounting for the variability in lockage times, modeling the queuing delays provided more challenges and did not yield particularly robust results. The findings of this project clearly set in place the groundwork for future work that seeks to improve the modeling of vessel behavior along the Ohio River ‐ as well as other critical inland waterways that are vital for moving cargo across the U.S.
Analysis
The statistical distributions of queuing time and lockages are not normal, as is assumed in a classical regression model. Rather, in practice, the vast majority of delays are short. Most vessels begin their lockage almost as soon as they arrive, and most lockages run into no special difficulties that slow them down. Occasionally, though, either a lock or an individual vessel will run into particular trouble. This can result in very lengthy delays. In technical terms, the distribution is positively skewed, with a long tail stretching toward very high values. This combined with the bounded nature of the distribution on the left‐hand side – it is impossible to wait in queue or pass through a lockage and have a negative delay – the nature of these dependent variables might seem to argue against use of linear‐normal regression and instead require a Maximum Likelihood Model, on the order of a hazard or Operational Model.106 106 For some of the key models reported in the analysis to follow, we experimented with parametric hazard models – in particular the Weibull form, as recommended by Wu et al. (2001) – for purposes of checking the robustness of our results. These models did change the behavior of some of our control variables, but did not undermine the main conclusions presented here. Furthermore, to the extent the findings differed between the two modeling techniques, the linear regressions were less likely to produce coefficient estimates that were nonsensical because they were contrary to one‐tailed hypotheses.
69
We found that performing a log transformation on delay times, so as to reduce the disproportionate impact of outliers, converted the pair of dependent variables so they closely followed the standard normal distribution even before incorporation of systematic predictors. Earlier, we discussed the practical problems that accompany working with LPMS data (flaws that have been addressed through painstaking data corrections). Two additional methodological issues emerging from the nature of these dependent variables also are worth addressing: one is an opportunity for the advanced analysis to come, the other an almost intractable complication. The opportunity springs from the impact of logging the dependent variables. Queue delays and lockage delays may seem as though they bear little relation to each other (Most obviously, the wait in queue is contingent upon what happens with other vessels in the system – e.g. how many there are, what sorts of delays they are experiencing – whereas the speed of lockage itself likely will be unrelated to the amount of traffic).
Certainly it is true that extreme delays in queue and extremely slow lockages are so rare that they reveal no tendency to appear in tandem. Our two dependent variables are weakly correlated at the 0.19 level.107 The two procedures are so distinct that it might seem tempting to separate predictive models for the two stages completely, as other researchers have done (Wu et al., 2001). Yet some of the measured, and likely some of the unmeasurable, influences on vessel speed should work on both varieties of the duration. Logging the two delays reduces the impacts of stray and extreme values. This scenario draws more heavily on patterns among standard lockages, for which the queue delay and the lockage delay correlate modestly with each other. The logged versions of the queue delay and the lockage delay show a stronger correlation of 0.35, a pattern that has little chance of appearing purely by accident. That correlation represents a valuable opportunity, and as we move closer to a fully specified model of Queue Delay and Lockage Delay, we are able to employ regression methods that exploit the relationship between these two seemingly unrelated delays to gain more information about how each one behaves. The research presented here removes limitations that appeared in previous research, which was caused by analyzing each delay in isolation.
The essentially intractable dilemma posed by these lockage‐specific delay data, stems from the sequential nature of vessel movement through a lock chamber. Irrespective of what process drives speed through the bottleneck, delay values necessarily spike when a vessel obstructs traffic (blocking progress). The way that one vessel’s delay can back up a lock queue violates assumptions of independence found in conventional regression models. However, this interdependence bears little resemblance to the type of autocorrelation for which time‐series models have been designed. 107 Even this weak relationship achieves statistical significance in data with almost 600 thousand observations. The relationship did not appear by accident, but so many possible explanations – both theoretically interesting and trivial – might account for it that substantive significance is not worth exploring until we move closer to a properly specified model.
70
The frequency of vessel trips will have ebbs and flows, and the gap between one vessel’s arrival time and another vessel’s arrival time therefore varies widely. As such, no clean relationship can arise between one observation and the next. Nor would the spillover in delay from one vessel to another emerge simply from a measure of when each one arrived, or how many vessels tend to move around during any one period in the data. Usually a vessel only significantly delays its successors when suffering an uncharacteristically long delay. Predicting this lag in the data using a simple time‐series model would produce worse results than ignoring it entirely.
For example, suppose we hypothesize that movement upstream during high‐volume currents will slow vessel passage. Further suppose that three vessels are moving upstream toward a given lock. The effect of any hypothesized delay on the third vessel in queue must take into account not only the speed with which it moves upstream, but also the atypical slowness exhibited by the second vessel in queue. That sluggishness in turn reflects not just the effect of the water flow on the second vessel itself, but also the unexpectedly lengthy wait it endured while the first vessel completed the lockage process. Treating the long delay experienced by the third vessel as a primary function (of the presence of other vessels) would both exaggerate the importance of an increase in traffic and vastly underestimate the causal importance of river movement. A simpler model ‐ one that looks at how speeds change as river flow varies and eliminating as much of the dependence as possible through thorough model specification ‐ might err in the opposite direction by missing some of the dependence across observations. That imperfection would seem more desirable than the alternative, given that the eventual goal of this project is to implement the predictive models through a computer simulation of the river system that imposes additional bounds on vessel behavior. That is, if vessels in the simulation have optimistic speed estimates in the presence of other vessels, they still would end up proceeding at a slower pace once the simulation forced them to wait for the lock to clear. Thus, it makes sense for the statistical model to focus on the overall expected delay, and to allow a portion of the bottleneck effect to fluctuate with traffic in a simulation environment.108
108
In principle, a researcher would not be forced to choose between the two imperfect options described here. Rather than use a simple model that assumes independence outside of the systematic causal influences (and the atheoretical seasonal influences) that unite different observations, or apply a time‐series model developed for less‐complex conditions that threatens to make matters worse, one could imagine a model that makes a vessel’s estimated delay (a) a function of the previous vessel’s expected delay, (b) conditioned on the gap in time between them. This sort of complex recursive time‐series model is well past the scope of this report, however, and will have to remain a goal of future research.
71