Entre dos fuegos - En Pocas Palabras

Table 6.1: Example dataset of processed smart card data

Smart Card Departure Arrival Departure Location Arrival Location

ID Timestamp Timestamp

2348 2016-03-14 9:06 2016-03-14 9:18 Central Station Main Square

2348 2016-03-14 15:03 2016-03-14 15:21 Main Square Central Station

2348 2016-04-14 11:01 2016-04-14 11:03 Central Station Shopping Mall

5231 2016-03-14 10:03 2016-03-14 10:33 Suburban Street Main Square

5231 2016-03-14 14:37 2016-03-14 15:07 Main Square Suburban Street

. . . .

6.2 Activities and Smart Card Data

Smart card data come in various formats and there are variations in the technical details of the systems that collect the data between diffirent public transport operators. Some ticketing systems collect only check-ins, while other systems collect both check- ins and check-outs. This latter system reduces the need to estimate the check-out locations afterward, which is an advantage for the analysis. However, a number of methodologies that estimate check-out locations based solely on the check-ins are discussed in Pelletier et al. (2011).

We assume a smart card database consists of records that contain a smart card id (often the serial number of the media used), a time stamp, the location of the transac- tion, the type of action (check-in or check-out) and possible additional information such financial data (which we neglect in this chapter).

Often it is easier to link a check-in and a check-out to create a table of journeys instead of raw transactions. For this purpose it is a matter of sorting the raw transactions based on the smart card id and on the time stamp. Consecutive check-in and check-out pairs can be easily combined this way, yielding a dataset with the structure displayed in Table 6.1.

From this dataset of journeys, we derive a similar dataset of activities. In order to describe this process, we first formalize the symbols used to describe both the journeys and the activities.

6.2.1 Formal definitions of journeys and activities

Let us now introduce the formal definitions that make up journeys and activities. We work with the assumption that individuals alternate between traveling (journeys) and performing activities. This can be done without loss of generality, if we allow activities to have no duration (i.e. their start and end times coincide) and journeys that have the same location as departure and arrival locations. To define activities, we first define how to represent time and space:

114 Exploratory Analysis of Time-Space patterns in Smart Card Data

ownership in favor of on-demand automated vehicles. An advantage of such a system is that, if designed properly, vehicle utilization can be increased as currently most cars are not utilized for the majority of the time. The big challenge for this business model is whether it can be made reliable. If the chances of getting an on demand vehicle are very low during peak-hours, there is a still an important reason for many individuals to keep owning cars. In order to decrease car ownership, it is important that the information technology used to organize such systems is user friendly and reliable. It is likely that this will require models that go beyond the scheduling of individual trips, but take activity patterns of the customer base into account.

In order to provide good service qualities to the passengers of public transport and other systems where vehicle sharing is employed to increase utilization, it is important to understand what goals passengers are trying to accomplish with their journeys and what alternatives they consider as acceptable for achieving these goals. A passenger who has to catch a plane at the airport has other requirements related to the journey than a passenger who wants to have coffee with family. Nowadays, this type of information is barely taken into account as public transport operators focus on moving certain volumes of peoples between origin destination pairs as efficient as possible. With the advent of potentially cheaper and smarter private transportation systems on the horizon, there is a great need to design public transport systems and the related information systems from the perspective of the passenger.

In previous chapters we have considered home-work patterns in smart card data as well as temporal patterns. In this chapter we focus on patterns that look at temporal and spatial properties, as activities performed by individuals happen at a certain time interval at a certain location. In this chapter we consider how we can improve upon these methods by proposing a way in which the generated patterns can be validated. As we do not know the underlying pattern for actual smart card data, we develop a way to produce synthetic smart card data for which these patterns are known. We generate data for a mockup scenario based on the city of Utrecht in the Netherlands. We evaluate a clustering approach that groups stations based on similar temporal profiles and show that this method can also be applied to other methods that extract behavioral patterns from smart card data.

The remainder of this chapter is organized as follows: In Section 6.2 we consider and formalize the general structure of smart card data and its relation to activities. In Section 6.3, we introduce an algorithm that can generate synthetic smart card data based on predefined activity patterns that can also be used in the validation step. In Section 6.4, we propose a way to make groups of stations based on the distributions of labelled time intervals at each station. In Section 6.5, we introduce the demand patterns and synthetic smart card data set that we use in our experiments. In Section 6.6, we compute groups of similar stations based on the synthetic data set and validate how well these groups of stations match with the predefined demand patterns. In Section 6.7, we conclude with a brief discussion of the obtained results and sketch a direction for future research on the topic of methodologies for smart card data analysis.

6.2 Activities and Smart Card Data 115

Table 6.1: Example dataset of processed smart card data

Smart Card Departure Arrival Departure Location Arrival Location

ID Timestamp Timestamp

2348 2016-03-14 9:06 2016-03-14 9:18 Central Station Main Square

2348 2016-03-14 15:03 2016-03-14 15:21 Main Square Central Station

2348 2016-04-14 11:01 2016-04-14 11:03 Central Station Shopping Mall

5231 2016-03-14 10:03 2016-03-14 10:33 Suburban Street Main Square

5231 2016-03-14 14:37 2016-03-14 15:07 Main Square Suburban Street

. . . .

6.2 Activities and Smart Card Data

From this dataset of journeys, we derive a similar dataset of activities. In order to describe this process, we first formalize the symbols used to describe both the journeys and the activities.

6.2.1 Formal definitions of journeys and activities

116 Exploratory Analysis of Time-Space patterns in Smart Card Data • Time instants are modeled by choosing a temporal unit (e.g. minutes or seconds)

and a time offset (for Unix timestamps this is the start of the first of January 1970). As a result, we can use common arithmetic on the natural numbers to work with time instants.

• Time durations are also modeled using the natural numbers, with the same

temporal unit as we are using for time instants.

• A set of locations L that occur within our dataset.

Using our chosen representation of time and space, we can now define how to model activities:

Definition 2. An activity Aiis a 3-tuple (ai, di, li)where ai∈ N is the time at which the

activity starts, di∈ N is the time at which the activity ends and a location li∈ L where the

activity takes place. The duration of an activity is computed by di− ai.

Suppose that some individual performs n activities at different locations. We can then consider following activity sequence for this individual:

A1= (a1, d1, l1), A2= (a2, d2, l2), . . . , An= (an, ∞, ln)

In this sequence, it must hold that ai di ai+1for all i, since time always goes

forward. Associated with a sequence of activities, we have a corresponding sequence of journeys. A journey can be defined as follows:

Definition 3. A journey Jiis a 4-tuple (di−1, ai, li−1, li)where di−1is the departure time

at which the journey starts, aiis the arrival time at which the journey ends, li−1 ∈ L is the

origin location of the journey and li∈ L is the destination location of the journey.

Suppose that we know the activity sequence of a certain individual. The sequence of journeys of the same individual will then have the following structure:

J1 = (?, a1, ?, l1), J2= (d1, a2, l1, l2), . . . , Jn = (dn−1, an, ln−1, ln)

The chronological order defined on the ai’s and di’s also applies when we consider

a sequence of journeys. Given a sequence of activities, we do not know the precise departure time and origin of the first journey, which is indicated with the question mark symbol in the above sequence.

The relationship between journeys and activities can be exploited in this chapter. Although smart card datasets typically contain information on journeys, we can transform a sequence of journeys to a sequence of activities and vice versa. This can be done in a streaming fashion, as we only need to keep two journeys (or activities) in memory to produce an activity (or journey) assuming they are ordered in time.

In document En Pocas Palabras - Jeffrey Archer (página 123-131)