Elaboración de un diagrama cuadrado mediante ggplot2

At the second-by-second (disaggregate) level, GPS data – despite cleaning and

smoothing) can contain noise which is an impediment in modelling. To deal with this, data is aggregated to the road segment level (see Section 4.3.1). For the purposes of this research, two forms of road segment aggregation are used:

1. Road speed segments; and 2. Road segments.

A road speed segment is comprised of a series of sequential and uninterrupted second- by-second observations which share a common speed limit, school zone status and trip. For example, they share the same speed limit and are all in a school zone or none are in a school zone. Figure 5-7 is an illustrative example of road segments based on the

speed limit of the road. Importantly, a road speed segment may include observations from more than one physical road but may not include driving by more than one driver. In addition, the start and end of trips are always the start and end of a road speed segment.

Figure 5-7: Illustration of road speed segments

Common driver, vehicle, trip and speed limit characteristics are kept unchanged in the aggregated dataset. However due to the aggregation a number of aggregate variables are required to account for variables that change within a road speed segment. Most of these relate to acceleration, braking and speeding behaviour as the acceleration during a segment may change and a driver could exceed the speed limit for none, some or all of a particular segment and do so at various magnitudes. A summary of these additional variables is shown in Table 5-3. For most analyses, the distance (as a proportion of the total segment distance) is used as the measure of speeding. This reduces the loss of information that occurs as a result of using categorical variables.

Table 5-3: Road segment aggregated variables

Variable Description

NumObs Number of second-by-second observations included in the road speed segment

TotSegDist Total segment distance

Rain Binary variable indicating if there was rainfall recorded for at least 50 percent

of observations included in the segment (1) or not (0).

AvgSpeed Average speed recorded within the segment where speed > 0 km/h

Speeding Variables

Speed1S Binary variable indicating if the driver exceeded the speed limit by 1 km/h or more for at least 20 percent of observations included in the segment

DistSpeed75P Total distance driven at a speed exceeding 75 percent of the speed limit

DistSpeed01 DistSpeed05 DistSpeed10 DistSpeed15 DistSpeed20

Total distance driven at or above 1 km/h, 5 km/h, 10 km/h, 15 km/h or 20 km/h above the speed limit

Speed75Pp Proportion of observations recorded in excess of 75 percent of the speed limit

SpeedO1p SpeedO5p SpeedO10p SpeedO15p SpeedO20p

Proportion of observations recorded at or above 1 km/h, 10 km/h or 20 km/h above the posted speed limit

SpeedD75Pp Proportion of distance recorded in excess of 75 percent of the speed limit

SpeedD1p SpeedD5p SpeedD10p SpeedD15p SpeedD20p

Proportion of distance recorded at or above 1 km/h, 5 km/h, 15 km/h or 20 km/h above the posted speed limit

Acceleration and Braking Variables Accel0P

Accel1P Accel2P … Accel9P

Proportion of acceleration events where acceleration is ≤ 1 m/s2_{, ≤ 2 m/s}2_{, ≤3, ≤4,}

≤5, ≤6, ≤7, ≤8, ≤9 and >9 m/s2 Brake0P Brake1P Brake2P … Brake9P

Proportion of braking events where acceleration is ≤ 1 m/s2_{, ≤ 2 m/s}2_{, ≤3, ≤4, ≤5,}

≤6, ≤7, ≤8, ≤9 and >9 m/s2 Accel0Pd Accel1Pd Accel2Pd … Accel9Pd

Proportion of acceleration events where acceleration is ≤ 10%, ≤20%, ≤30≤, ≤40%, ≤50%, ≤60%, ≤70%, ≤80%, ≤90% and ≤100% of the maximum acceleration recorded for that driver.

Variable Description Brake0Pd Brake1Pd Brake2Pd … Brake9Pd

Proportion of braking events where negative acceleration is ≤ 10%, ≤20%, ≤30≤, ≤40%, ≤50%, ≤60%, ≤70%, ≤80%, ≤90% and ≤100% of the maximum acceleration recorded for that driver.

Road speed segment aggregation is used for analyses where the speed limit of the road is considered to be the main unit of analysis. Where more detailed temporal and spatial data is necessary, road segments (see Section 7.5) are used instead. 5.5 Survey results

A number of surveys were conducted during different phases of the study

(recruitment, completion, etc.). These surveys contain quantitative and qualitative data. Qualitative data requires coding and some quantitative data was recalculated and or reclassified. This is done to combine similar responses and reduce the

complexity of the data. 5.5.1 Demographics

Driver demographics were collected during recruitment. This includes age, gender, occupation, number of crashes, licence type as well as some basic vehicle information (make, model, year of manufacture and transmission type). In addition, age, gender and relationship data was collected for all household members. Information about the household location was requested but can also be derived from analysing the GPS data (Ellison et al., 2010).

The processing conducted on the demographic data was limited to categorising or re- categorising demographics to create variables with fewer but larger numbers of drivers in each category. This was predominantly used for age, gender and vehicle year of manufacture. Table 5-4 covers the most commonly used categories but other configurations have also been used for specific analyses. If this is the case, it is mentioned in the section covering that particular part of the analysis.

Table 5-4: Common categorisation of demographic variables

Variable Categories

Age (2 categories) 18-30, 31-65

Age (3 categories) 18-30, 31-45, 46-65

Age (4 categories) 18-25, 26-30, 31-45, 46-65

Licence Type Learner/Provisional, Full

Vehicle Model Year <= 1999, 2000 to 2004, >= 2005

Vehicle Type Sedan, Hatchback, Other

5.5.2 Psychological survey

After recruitment, drivers completed a five section, fifty question psychological survey adapted from a previous study by Machin and Sankey (2008). The survey was

conducted online and covered a range of factors including personality, risk perception and self-reported driving behaviour. See Greaves and Ellison (2011) for more details on the background of the psychological survey.

The responses to the survey are all nominal variables. Depending on the specific analysis they are used either as standalone variables or as part of the following eight composite personality scales:

 Speeding;  Aggression;  Altruism;  Excitement;

 Worry and Concern;  Likelihood of Accident;  Efficacy; and

 Aversion to Risk.

These composite scales are the average responses to the questions which make up each of these scales.

The data collected in this survey is used to incorporate drivers‘ inherent personality characteristics into their driver and vehicle profile. This also includes drivers‘

perceptions of the risk associated with a number of driving behaviours. This includes speeding (by 10 and 20 km/h), using a mobile telephone and running red lights.

5.5.3 Exit survey

After the completion of the GPS data collection period, study participants completed an exit survey. The purpose of this survey was to understand (generally) how

participants felt about the study and its components. It also served to assist in

determining if changes detected in behaviour by the GPS device were changes drivers were cognisant that they were making. In all, of the 106 drivers determined to have valid ‗before‘ and ‗after‘ data, 103 drivers completed the exit survey.

The exit survey was conducted online (see Figure 5-8) and consisted of a number of multiple choice and open-ended questions. Each participant in the study was provided with a unique URL with which to access the survey to ensure that responses could be accurately matched with the participant‘s other study data.

Figure 5-8: Screenshot of exit survey

The primary use of the exit survey in this research is to account for the influence of the financial incentive on changes in behaviour and to identify those drivers that were more aware of their speeding behaviour. This assists in answering the second set of hypotheses which aims to determine if making drivers more aware of their driving behaviour makes them safer drivers (see Section 4.1.2). With this aim in mind a set of indicator variables, shown in Table 5-5, were developed. Afterwards, each survey response was manually coded into the indicator variables. Each Boolean variable indicates if that particular aspect was mentioned (Y) or not (N). It did not matter

where in the survey it was mentioned. In some cases the same aspect was mentioned in responses to more than one question but these were not codified differently. Each record in the codified dataset contains a user ID, a unique variable, which functions as the primary key. This is the same variable used to identify the vehicle in the other datasets and therefore simplifies aggregation and analysis.

Table 5-5: Variables created for use in exit survey analysis

Variable Name Description Financial aspects

Incentive (as motivator) Was the financial incentive mentioned as a motivator for participating in the study?

Incentive (charge phase) Was the ability to earn money mentioned in the context of the introduction of the charging phase?

Incentive (post-survey) Was the remaining financial incentive at the end of the study

mentioned?

Made money78 _{Did the driver make money / Was the remaining incentive greater than}

$0.00? Speeding

Speeding (any mention) Was speeding mentioned in response to any open-ended question in any

context?

Speeding (in denial) Was there an indication that the driver was sceptical/disbelieving of the

speeding behaviour they were shown during the study?

Speeding (awareness) Was there an indication that the driver became more aware of their

speeding behaviour than they had been before the charging phase? Speeding (self-reported) Did the driver indicate that they had reduced speeding in the charging

phase ‗sometimes‘, ‗often‘ or ‗always‘? ‗Not at all‘ and ‗Occasionally‘ responses were treated as no.

Speeding (reduce >

incentive) Did the driver indicate that they would have reduced their speeding in the charging phase if the incentive had been doubled or tripled? ‗Sometimes‘, ‗often‘ and ‗always‘ responses for double and/or triple incentives were treated as yes.

Speeding (any reduce) Did the driver indicate they did or would have reduced their speeding in

the charging phase for the current incentive and/or double incentive and/or triple incentive.

Speeding (GPS)79 _{Did the GPS device record a reduction in speeding as a proportion of}

total distance in the charging phase compared to the ‗before‘ phase?

5.6 Summary

The data used in this thesis are comprised of a number of disparate but related datasets. In order to allow for statistical analyses to be conducted these individual datasets needed to be cleaned, restructured and combined. This chapter described the processes involved in accomplishing this such that an observation in one dataset can be directly related to the relevant observations in the other datasets.

78_{Derived from GPS data} 79_{Derived from GPS data}

The spatial datasets (Section 5.2) represented the road environment in the study area. These include the location of intersections, school zones, rainfall and other

characteristics. These were matched to each second-by-second GPS observation based on the position (latitude and longitude) of each. The GPS data itself was in turn used to detect the driving behaviours of interest (speeding, acceleration and braking) using the methods described in Section 5.3. Subsequently, it became possible to aggregate the GPS observations – over 80 million in total – to road speed segments whereby a new road speed segment started every time the speed limit changed. These

aggregated segments – together with the coded survey results completed by each driver – are in turn used as the unit of analysis and independent variables in the aggregate analyses presented in Chapter 6.

6 RESULTS AND DISCUSSION: AGGREGATE ANALYSES

This chapter provides an aggregate analysis of speeding behaviour. The aim is two- fold. First, the analysis allows for a better understanding of the characteristics of the dataset and thereby aid in determining the best methods of studying a change in a driver‘s behaviour that occurs after drivers are made aware of their speeding

behaviour. Secondly, this chapter demonstrates – through a series of models – that due to the high degree of noise in this dataset, a disaggregate analysis is necessary to isolate the inherent driver characteristics that are of interest. These analyses were conducted in an aggregate form using the road speed segments discussed in 5.4 and the aggregate behavioural variables created for each of these segments. Similar analyses were conducted for acceleration and braking behaviour but those performed worse than the already poor models of speeding and are therefore not presented here. In this chapter only data from the ‗before‘ phase are used.

In document Jorge Ramos Arcas. Trabajo de fin de grado Diagramas de sectores en R (página 44-52)