combines duration and activity time yields more informative results than a classifier that combines a single one of these options with a week/weekend label.
6.7 Discussion and Future Research
We have shown that we can generate a large amount of synthetic smart card data. We can analyze the temporal are spatial properties of activities implied by the smart card data and make groups of similar stations, all within reasonable time. As the synthetic smart card data is generated from a predefined set of demand patterns, we can validate the quality of the obtained clusters of stations. We learn that certain clusters match specific activity types by the tested approach, while other activity types are more difficult to distinguish. This makes sense, as some activity types may be similar, such as studying and working activities, while other locations offer different activity types at the same time, such as shopping, working and home activities.
From these results, we can reason that public transport operators do not know “everything” based on the smart card data. Even if there is a good way to segment passengers or activities, it is likely that some different activity types share temporal and spatial attributes and therefore will always be confused by looking solely at the smart card data. This relates strongly to the observation made in Chapter 4 that smart card data do not contain all information required to regard it as activity based demand.
The results show that although in theory smart card data miss information, it undeniably contains more information than traditional ticket sales data or random- ized manual passenger counts. Since some clusters could be associated with specific activity types, such as shopping or studying, there is fertile ground to develop methodologies that allows operators to explore the goals passengers pursue with their travels. The approach of generating synthetic smart card data and validating the outcome of such methods proposed in this chapter will be a valuable tool in the development of future methods.
There are a number of ideas that can be considered for future improvement of the methodology tested, or even for alternative methodologies. Since we make use of basic k-means clustering it seems there is room to experiment with other data mining techniques. One idea could be to work directly with the temporal attributes of the activities instead of the labels we currently work with, or with a hybrid approach that assigns a likelihood to each label instead of picking a single label. This keeps more information available during the clustering process, which could potentially be ex- ploited by more sophisticated clustering methods, but this makes it more challenging to present the output concisely to the end user.
Another idea to consider for future research is to use thresholds in the processing pipeline to produce binary vectors, indicating whether a certain type of label occurs with a high enough frequency at a station, instead of the distribution vectors we currently used. This would result in greater similarity for a station that has both
130 Exploratory Analysis of Time-Space patterns in Smart Card Data
Table 6.7: Validation based on the overlap fractions oaiof combinations of the loca-
tions in each cluster i that correspond to the locations in the set Ladefined
for a particular activity type a. If we see a percentage of 100%, this means that a certain cluster has captured all locations at which that activity type can be performed. If we see a percentage of 50%, half of the locations at which an activity can be performed is captured by the locations within that cluster. These percentages are given for three different types of temporal classifiers. The best situation is when a cluster captures the locations of as few activity types as possible, preferably with high values. Values with more than 40% are displayed in a bold font.
(a) Table based on the clustering based on a temporal classifier that generates labels based on duration and arrival times.
Cluster home shop-day shop-eve. shop-major study work work-alw. work-eve.
A1 44% 0% 0% 0% 0% 0% 0% 0% A2 0% 0% 0% 0% 50% 0% 0% 0% A3 24% 0% 0% 67% 0% 43% 50% 20% A4 4% 20% 25% 0% 0% 7% 0% 20% A5 8% 20% 25% 0% 0% 14% 0% 20% A6 12% 60% 50% 33% 0% 21% 0% 20% A7 8% 0% 0% 0% 50% 14% 50% 20%
(b) Table based on the clustering based on a temporal classifier that generates labels based on week or weekend and the duration of activities.
Cluster home shop-day shop-eve. shop-major study work work-alw. work-eve.
B1 44% 0% 0% 0% 0% 0% 0% 0%
B2 0% 0% 0% 0% 50% 0% 0% 0%
B3 16% 20% 25% 33% 0% 29% 0% 20%
B4 8% 40% 50% 0% 0% 14% 0% 20%
B5 32% 40% 25% 67% 50% 57% 100% 60%
(c) Table based on the clustering based on a temporal classifier that generates labels based on week or weekend and the arrival time at activities.
Cluster home shop-day shop-eve. shop-major study work work-alw. work-eve.
C1 44% 0% 0% 0% 0% 0% 0% 0%
C2 0% 0% 0% 0% 50% 0% 0% 0%
C3 20% 40% 50% 33% 0% 36% 50% 40%
C4 8% 20% 25% 0% 0% 14% 0% 20%
C5 28% 40% 25% 67% 50% 50% 50% 40%
6.7 Discussion and Future Research 131
combines duration and activity time yields more informative results than a classifier that combines a single one of these options with a week/weekend label.
6.7 Discussion and Future Research
We have shown that we can generate a large amount of synthetic smart card data. We can analyze the temporal are spatial properties of activities implied by the smart card data and make groups of similar stations, all within reasonable time. As the synthetic smart card data is generated from a predefined set of demand patterns, we can validate the quality of the obtained clusters of stations. We learn that certain clusters match specific activity types by the tested approach, while other activity types are more difficult to distinguish. This makes sense, as some activity types may be similar, such as studying and working activities, while other locations offer different activity types at the same time, such as shopping, working and home activities.
From these results, we can reason that public transport operators do not know “everything” based on the smart card data. Even if there is a good way to segment passengers or activities, it is likely that some different activity types share temporal and spatial attributes and therefore will always be confused by looking solely at the smart card data. This relates strongly to the observation made in Chapter 4 that smart card data do not contain all information required to regard it as activity based demand.
The results show that although in theory smart card data miss information, it undeniably contains more information than traditional ticket sales data or random- ized manual passenger counts. Since some clusters could be associated with specific activity types, such as shopping or studying, there is fertile ground to develop methodologies that allows operators to explore the goals passengers pursue with their travels. The approach of generating synthetic smart card data and validating the outcome of such methods proposed in this chapter will be a valuable tool in the development of future methods.
There are a number of ideas that can be considered for future improvement of the methodology tested, or even for alternative methodologies. Since we make use of basic k-means clustering it seems there is room to experiment with other data mining techniques. One idea could be to work directly with the temporal attributes of the activities instead of the labels we currently work with, or with a hybrid approach that assigns a likelihood to each label instead of picking a single label. This keeps more information available during the clustering process, which could potentially be ex- ploited by more sophisticated clustering methods, but this makes it more challenging to present the output concisely to the end user.
Another idea to consider for future research is to use thresholds in the processing pipeline to produce binary vectors, indicating whether a certain type of label occurs with a high enough frequency at a station, instead of the distribution vectors we currently used. This would result in greater similarity for a station that has both
132 Exploratory Analysis of Time-Space patterns in Smart Card Data
residential and working activities, even if the frequency of these activities is different. This could result in the need for fewer clusters, but at the cost of information related to the distribution of the activity types in the final summary.
Part III
Individual Decision Strategies
Under Uncertainty
132 Exploratory Analysis of Time-Space patterns in Smart Card Data
residential and working activities, even if the frequency of these activities is different. This could result in the need for fewer clusters, but at the cost of information related to the distribution of the activity types in the final summary.
Part III
Individual Decision Strategies
Under Uncertainty
7
Passenger Route Choice in Case of Disruptions
Co-authors : Marie Schmidt, Leo Kroon, Anita Schöbel
This paper has been accepted after peer review for presentation at and publication in the proceedings of IEEE Conference on Intelligent Transport Systems, IEEE-ITSC2013.
7.1 Introduction
One of the major nuisances a passenger in railway transport can experience is a disruption, resulting in the cancellation of a number of train services. Not only do disruptions usually cause significant delays of the passengers’ journeys, but they may also confront the passenger with a complicated question: “What is the best way to
continue my journey?” In such a situation, a little bit of information may be extremely
valuable for making the right choice. As such, it is important that operators do the best they can in informing their passengers. However, even the operator himself may not have all information that would be required to give the best possible advice. In many cases it can take some time to assess the cause and severity of a disruption. In other situations the cause may have been resolved but the exact time before operations are back to normal is still uncertain.
In case good information is not available, the passenger faces a dilemma. Should he be optimistic about the duration of the disruption and hope it is over soon enough to wait for the planned fast connection, or should he be pessimistic and take an alter- native that might take much longer than the disrupted connection? Both approaches may lead to an unfortunate outcome, as the optimistic passenger might wait much longer than anticipated before the disruption is over, while the pessimistic passenger
7
Passenger Route Choice in Case of Disruptions
Co-authors : Marie Schmidt, Leo Kroon, Anita Schöbel
This paper has been accepted after peer review for presentation at and publication in the proceedings of IEEE Conference on Intelligent Transport Systems, IEEE-ITSC2013.
7.1 Introduction
One of the major nuisances a passenger in railway transport can experience is a disruption, resulting in the cancellation of a number of train services. Not only do disruptions usually cause significant delays of the passengers’ journeys, but they may also confront the passenger with a complicated question: “What is the best way to
continue my journey?” In such a situation, a little bit of information may be extremely
valuable for making the right choice. As such, it is important that operators do the best they can in informing their passengers. However, even the operator himself may not have all information that would be required to give the best possible advice. In many cases it can take some time to assess the cause and severity of a disruption. In other situations the cause may have been resolved but the exact time before operations are back to normal is still uncertain.
In case good information is not available, the passenger faces a dilemma. Should he be optimistic about the duration of the disruption and hope it is over soon enough to wait for the planned fast connection, or should he be pessimistic and take an alter- native that might take much longer than the disrupted connection? Both approaches may lead to an unfortunate outcome, as the optimistic passenger might wait much longer than anticipated before the disruption is over, while the pessimistic passenger
136 Passenger Route Choice in Case of Disruptions
may find out that the disruption had vanished just after he departed on his lengthy detour.
In the past decades, algorithms for situations where the input arrives over time, have been investigated under the name of “online algorithms”. Often the performance is measured by the so-called “competitive ratio”, which can be interpreted as the worst-case ratio of the obtained solution value to the best achievable solution value when having the required information in advance.
In this chapter we study the quality of strategies in an online passenger waiting problem, where a passenger has to decide between waiting for the end of the disrup- tion or taking a detour. We characterize the strategies of the passenger and analyze the competitive ratios of these strategies. We apply the results to some realistic disruption scenarios in order to show the applicability of this approach in decision support systems that can improve passenger guidance in a disrupted situation. Additionally, we investigate a version of the problem where we perform an average case analysis using different probability distributions.
7.2 Related Work
Over the years, many approaches dealing with decision making under uncertainty have been developed.
One popular way of analyzing online algorithms is competitive analysis, where the worst case ratio of the online solution compared to the offline solution is used to measure the quality of an online strategy (Sleator and Tarjan, 1985; Borodin and El- Yaniv, 1998). Another way of analyzing solution strategies is to introduce a probability distribution on the uncertain input parameters and consider the expected value of the solution value. An example of this approach is Fujiwara and Iwama (2002). In this chapter we refer to this approach as average-case analysis.
The first competitive analysis of online strategies in the literature appeared in Sleator and Tarjan (1985), where the performance of different online update rules on a list data structure was compared to the optimal offline schedule, i.e., the best schedule if all information was known before.
Other frameworks to deal with uncertainty are robust optimization (Ben-Tal et al., 2009), which aims to find a solution which is feasible for all possible realizations of scenarios, and stochastic optimization (Birge and Louveaux, 1997), where a probabil- ity distribution on scenarios is given and a solution needs to be feasible with high expectation.
Several uncertain shortest path problems on networks have been considered in the literature. For example, the Canadian traveler problem (Papadimitriou and Yannakakis, 1991; Bar-Noy and Schieber, 1991) asks how to find a path through a network in which edges may be blocked, which is revealed to the traveler when an adjacent node is reached. A different class of path finding problems deals with the situation where edge-lengths are uncertain (Yu and Yang, 1998). This problem has been studied under