• No se han encontrado resultados

Amor a primera vista 

In document En Pocas Palabras - Jeffrey Archer (página 120-123)

We have developed an approach to cluster temporal intervals derived from activity data at a station level using a parameterised penalty function nd to aggregate the results, such that we obtain the most interesting time intervals in the data. We repeat this process to obtain robustness fractions. Based on the robustness fractions, we constructed a tree-based labeling procedure. The labels allow us to find the most frequent pairs and triplets of activity types observed in individual activity chains. While the typical intervals associated with home and work activities are dominant, we are able to identify shorter activities as well and provide some insight on their relation to other activities within the activity chains of individual passengers.

Our current approach still has some drawbacks. The modular ring ZUwith U = 24

is a quite rigorous simplification, as we cannot distinguish between an activity that takes one hour and an activity that takes 25 hours. While this simplification allows

5.7 Conclusions and Future Work 111

Label 1 Label 2 Label 3 Percentage ofobserved triplets

Full labeling

Overnight LongEarly Overnight 19%

LongEarly Overnight LongEarly 16%

Overnight LongNoon Overnight 4%

Overnight ShortNoon Overnight 3%

ShortNoon Overnight ShortNoon 2%

ShortAfternoon Overnight ShortNoon 2%

Overnight ShortEarly Overnight 2%

LongNoon Overnight LongNoon 2%

ShortAfternoon ShortAfternoon Overnight 2%

Overnight ShortNoon ShortNoon 2%

Duration based labeling

Overnight Long Overnight 23%

Long Overnight Long 20%

Short Overnight Short 10%

Short Short Short 9%

Overnight Short Overnight 7%

Short Short Overnight 6%

Overnight Short Short 6%

Short Overnight Long 6%

Long Overnight Short 4%

Overnight Long Short 2%

Start time based labeling

Overnight Early Overnight 20%

Early Overnight Early 19%

Overnight Noon Overnight 7%

Noon Overnight Noon 5%

Noon Overnight Early 4%

Afternoon Overnight Noon 3%

Early Overnight Noon 2%

Afternoon Overnight Early 2%

Overnight Afternoon Overnight 2%

Overnight Early Noon 2%

Table 5.1: The most frequent triplets of consecutive labels for each labeling method and the percentage with which they occurs among all triplets detected when applying that particular labelling method. Triplets are generated based on the consecutive activities of individual smart cards. If the frequency of a triplet is 20% this means that if you look at three consecutive activities of a single smart card, in 20% of the cases these labels correspond to the labels of the triple.

110 Detecting Activity Patterns from Smart Card Data Overnight LongAfternoon LongEarly LongNoon ShortAfternoon ShortEarly ShortNoon ShortEvening (a) Combined labeling Overnight Long Short (b) Duration based labeling Afternoon Early Noon Overnight Evening (c) Start time based labeling

Figure 5.3: Network visualisation of the adjacency matrices based on the different labelings.

time to visit a nearby location. The tenth triplet shows a pattern where two activities are started within the noon window. There is also evidence of people performing a long activity one day and a short activity the next day, and vice versa, as witnessed by triplets eight and nine in the duration based labeling.

When we compare these results to the analysis of the “other” activity label consid- ered during the analysis of Gautineau data by Devillaine et al. (2012), we see that the third triplet in the start time based labeling suggest a possible peak around 12:00 for at least some of the stations. However, they also found a peak around 16:00, which would be the afternoon label in our case. However, if we consider labels in the start time table, only the sixth and ninth triplets represent evening activities and both are not as strong as the single noon triplet.

5.7 Conclusions and Future Work

We have developed an approach to cluster temporal intervals derived from activity data at a station level using a parameterised penalty function nd to aggregate the results, such that we obtain the most interesting time intervals in the data. We repeat this process to obtain robustness fractions. Based on the robustness fractions, we constructed a tree-based labeling procedure. The labels allow us to find the most frequent pairs and triplets of activity types observed in individual activity chains. While the typical intervals associated with home and work activities are dominant, we are able to identify shorter activities as well and provide some insight on their relation to other activities within the activity chains of individual passengers.

Our current approach still has some drawbacks. The modular ring ZUwith U = 24

is a quite rigorous simplification, as we cannot distinguish between an activity that takes one hour and an activity that takes 25 hours. While this simplification allows

5.7 Conclusions and Future Work 111

Label 1 Label 2 Label 3 Percentage ofobserved triplets

Full labeling

Overnight LongEarly Overnight 19%

LongEarly Overnight LongEarly 16%

Overnight LongNoon Overnight 4%

Overnight ShortNoon Overnight 3%

ShortNoon Overnight ShortNoon 2%

ShortAfternoon Overnight ShortNoon 2%

Overnight ShortEarly Overnight 2%

LongNoon Overnight LongNoon 2%

ShortAfternoon ShortAfternoon Overnight 2%

Overnight ShortNoon ShortNoon 2%

Duration based labeling

Overnight Long Overnight 23%

Long Overnight Long 20%

Short Overnight Short 10%

Short Short Short 9%

Overnight Short Overnight 7%

Short Short Overnight 6%

Overnight Short Short 6%

Short Overnight Long 6%

Long Overnight Short 4%

Overnight Long Short 2%

Start time based labeling

Overnight Early Overnight 20%

Early Overnight Early 19%

Overnight Noon Overnight 7%

Noon Overnight Noon 5%

Noon Overnight Early 4%

Afternoon Overnight Noon 3%

Early Overnight Noon 2%

Afternoon Overnight Early 2%

Overnight Afternoon Overnight 2%

Overnight Early Noon 2%

Table 5.1: The most frequent triplets of consecutive labels for each labeling method and the percentage with which they occurs among all triplets detected when applying that particular labelling method. Triplets are generated based on the consecutive activities of individual smart cards. If the frequency of a triplet is 20% this means that if you look at three consecutive activities of a single smart card, in 20% of the cases these labels correspond to the labels of the triple.

112 Detecting Activity Patterns from Smart Card Data

us to get a general idea of what is happening within the system without having to look at too many numbers, it is likely more caution is necessary if we want to construct the input for activity based models. Moreover, we ignore the distinction between weekdays and weekends, which has a very significant impact on travel behaviour. For the implementation of a valid simulation, it will be necessary to make this distinction. If we introduce these detailed descriptions of the activity intervals, it would also be necessary to reconsider the distance measure used. In future work a more rigorous mathematical basis could be used in its design as part of this process. Finally, we constructed our labeling algorithm by hand. A question is whether we can use automatic classification algorithms instead of our manually constructed labeling procedures.

Aside from further refinements of our methods, such as reducing the number of parameters to set and varying distance measures, there are two main topics for future research. First, we can use either the clustering output at the station level or the complete distribution of intervals observed at the stations to identify similar classes of stations. If we are able to reduce our stations to a small number of important classes, we can attempt to include some spatial aspects in the analysis of the activity chains. Finally, it can be consider how our findings can be used to improve demand generation procedures for agent-based simulation studies similar to the study in Chapter 4.

6

Exploratory Analysis of Time-Space patterns in

Smart Card Data

Co-authors : Evelien van der Hurk, Leo Kroon and Peter Vervest This chapter is a working paper.

6.1 Introduction

The introduction of smart card ticketing systems has resulted in a wealth of data for public transportation operators compared to analogue ticketing systems. Previously a lot of effort was required to make crude estimates of passenger volumes, often split in peak and off-peak volumes. Smart card ticketing provides detailed information related to time and location of arrivals and possibly departures for each passenger. It may come as no surprise that some practitioners think that with the availability of smart card data, the operators now know everything there is to know about the passengers.

Unfortunately smart card data does not contain all information relevant to transport planning. Researchers in transport demand modeling (Ben-Akiva et al., 1994) are very aware of the difference between stated choice data, i.e. information that states which travel option a passenger considers as best from a set of given alternatives, and

revealed choice data, i.e. information about which actual choice a passenger took but

not the considered alternative travel options.

Parallel to the introduction of smart cards in public transport, disruptive new business models were introduced in private transportation often driven by informa- tive smart phone applications that give detailed expectations about the journey a prospective customer is considering. Such developments give travelers more freedom in planning their journeys. In the nearby future it may also become possible that fleets of on-demand automated vehicles are introduced, decreasing the need for car

112 Detecting Activity Patterns from Smart Card Data

us to get a general idea of what is happening within the system without having to look at too many numbers, it is likely more caution is necessary if we want to construct the input for activity based models. Moreover, we ignore the distinction between weekdays and weekends, which has a very significant impact on travel behaviour. For the implementation of a valid simulation, it will be necessary to make this distinction. If we introduce these detailed descriptions of the activity intervals, it would also be necessary to reconsider the distance measure used. In future work a more rigorous mathematical basis could be used in its design as part of this process. Finally, we constructed our labeling algorithm by hand. A question is whether we can use automatic classification algorithms instead of our manually constructed labeling procedures.

Aside from further refinements of our methods, such as reducing the number of parameters to set and varying distance measures, there are two main topics for future research. First, we can use either the clustering output at the station level or the complete distribution of intervals observed at the stations to identify similar classes of stations. If we are able to reduce our stations to a small number of important classes, we can attempt to include some spatial aspects in the analysis of the activity chains. Finally, it can be consider how our findings can be used to improve demand generation procedures for agent-based simulation studies similar to the study in Chapter 4.

6

Exploratory Analysis of Time-Space patterns in

Smart Card Data

Co-authors : Evelien van der Hurk, Leo Kroon and Peter Vervest This chapter is a working paper.

6.1 Introduction

The introduction of smart card ticketing systems has resulted in a wealth of data for public transportation operators compared to analogue ticketing systems. Previously a lot of effort was required to make crude estimates of passenger volumes, often split in peak and off-peak volumes. Smart card ticketing provides detailed information related to time and location of arrivals and possibly departures for each passenger. It may come as no surprise that some practitioners think that with the availability of smart card data, the operators now know everything there is to know about the passengers.

Unfortunately smart card data does not contain all information relevant to transport planning. Researchers in transport demand modeling (Ben-Akiva et al., 1994) are very aware of the difference between stated choice data, i.e. information that states which travel option a passenger considers as best from a set of given alternatives, and

revealed choice data, i.e. information about which actual choice a passenger took but

not the considered alternative travel options.

Parallel to the introduction of smart cards in public transport, disruptive new business models were introduced in private transportation often driven by informa- tive smart phone applications that give detailed expectations about the journey a prospective customer is considering. Such developments give travelers more freedom in planning their journeys. In the nearby future it may also become possible that fleets of on-demand automated vehicles are introduced, decreasing the need for car

114 Exploratory Analysis of Time-Space patterns in Smart Card Data

ownership in favor of on-demand automated vehicles. An advantage of such a system is that, if designed properly, vehicle utilization can be increased as currently most cars are not utilized for the majority of the time. The big challenge for this business model is whether it can be made reliable. If the chances of getting an on demand vehicle are very low during peak-hours, there is a still an important reason for many individuals to keep owning cars. In order to decrease car ownership, it is important that the information technology used to organize such systems is user friendly and reliable. It is likely that this will require models that go beyond the scheduling of individual trips, but take activity patterns of the customer base into account.

In order to provide good service qualities to the passengers of public transport and other systems where vehicle sharing is employed to increase utilization, it is important to understand what goals passengers are trying to accomplish with their journeys and what alternatives they consider as acceptable for achieving these goals. A passenger who has to catch a plane at the airport has other requirements related to the journey than a passenger who wants to have coffee with family. Nowadays, this type of information is barely taken into account as public transport operators focus on moving certain volumes of peoples between origin destination pairs as efficient as possible. With the advent of potentially cheaper and smarter private transportation systems on the horizon, there is a great need to design public transport systems and the related information systems from the perspective of the passenger.

In previous chapters we have considered home-work patterns in smart card data as well as temporal patterns. In this chapter we focus on patterns that look at temporal and spatial properties, as activities performed by individuals happen at a certain time interval at a certain location. In this chapter we consider how we can improve upon these methods by proposing a way in which the generated patterns can be validated. As we do not know the underlying pattern for actual smart card data, we develop a way to produce synthetic smart card data for which these patterns are known. We generate data for a mockup scenario based on the city of Utrecht in the Netherlands. We evaluate a clustering approach that groups stations based on similar temporal profiles and show that this method can also be applied to other methods that extract behavioral patterns from smart card data.

The remainder of this chapter is organized as follows: In Section 6.2 we consider and formalize the general structure of smart card data and its relation to activities. In Section 6.3, we introduce an algorithm that can generate synthetic smart card data based on predefined activity patterns that can also be used in the validation step. In Section 6.4, we propose a way to make groups of stations based on the distributions of labelled time intervals at each station. In Section 6.5, we introduce the demand patterns and synthetic smart card data set that we use in our experiments. In Section 6.6, we compute groups of similar stations based on the synthetic data set and validate how well these groups of stations match with the predefined demand patterns. In Section 6.7, we conclude with a brief discussion of the obtained results and sketch a direction for future research on the topic of methodologies for smart card data analysis.

In document En Pocas Palabras - Jeffrey Archer (página 120-123)