Analysis of the effect of bus stops on the bus speed regarding the usage of public bus fleet as probe vehicles

(1)

(2)

by

Ignacio Ponsoda Llorens [email protected]

A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science

in Geospatial Technologies

Supervisor Joaquin Huerta

Department of Information Systems Universitat Jaume I

Co-Supervisor Jes´us de Diego

Head of Geospatial Department IDOM Consulting

Co-Supervisor Marco Painho

NOVA Information Management School Universidade Nova de Lisboa

(3)

(4)

Acknowledgements

This thesis would not have been possible without the support of the people that, in some way, has backed me during this master. I want to especially thank my family for their constant support and sacrifices. Thanks to Rosa, and to my classmates and friends for their guidance.

I want to thank all teachers I meet during this program, and of course, I would like to express my gratitude to Dr. Christoph Brox, Dr. Joaqu´ın Huerta, and Dr.

Marco Pahino as well as Elena Mart´ınez and Karsten H¨owelhans, to make possible this amazing experience. I would like to share my particular thanks to my supervisors, Dr. Joaqu´ın Huerta, Jes´us de Diego and Dr. Marco Pahino for providing valuable feedback.

Finally, I would like to show my gratitude to the Empresa Municipal de Trans- portes de Madrid for share provide the data, in special to Andr´es Recio, for his feedback on this project. And naturally, thanks to IDOM, for giving me the chance to develop this master thesis at their organization.

(5)

List of Tables

1.1 Publications regarding approaches about the consideration of bus lo-

cation data affected by bus stop. . . 8

2.1 Types of traditional traffic collectors according to [Leduc 2008] . . . . 17

3.1 List of Node.js packages used during the thesis . . . 23

3.2 List of R packages used during the thesis . . . 24

3.3 Data collected fields explanation . . . 28

3.4 Road network fields explanation . . . 29

3.5 Bus stops fields explanation . . . 30

4.1 Speeds validation per line. . . 37

4.2 Mean speed per lines with different approaches . . . 44

4.3 Mean speed difference per lines with different approaches comparing to EMT speeds . . . 48

(8)

List of Figures

1.1 Pre-processing stage methodology. . . 11

1.2 Processing stage methodology. . . 12

1.3 Post-processing stage methodology. . . 13

2.1 Location of Madrid. . . 21

3.1 Diagram of the layers join step. . . 26

3.2 Diagram about the collection workflow. . . 26

3.3 Bus lines used for the Use Case . . . 27

3.4 Heatmap of the data collected. . . 31

3.5 Density of points per hour. . . 31

3.6 Diagram of the speed calculation. . . 33

3.7 Diagram of the sections assignment. . . 34

3.8 Diagram of the stops assignment. . . 35

4.1 Data exploration per attribute. . . 38

4.2 Speed over time per day of the week. . . 39

4.3 Speed histogram per line. . . 40

4.4 Speed density per line. . . 40

4.5 Speed outliers. . . 41

4.6 Speed of points with stop assigned. . . 42

4.7 Speed of buses affected by bus stop that not stop on it, according to [Uno et al. 2009]. . . 43

4.8 Speed for buses affected by a bus stop but not stopping on it. . . 44

4.9 Shiny methodology. . . 45

4.10 Shiny application that compare the speed from the points according to [Uno et al. 2009] approach. . . 46

4.11 Shiny application with the mean speed for each section of the road network. . . 47

(9)

Abbreviations

API Application Programming Interface.

AVL Automated Vehicle Location.

EMT Transportation Public Company of Madrid.

GIS Geographic Information System.

GPS Global Positioning System.

ITS Intelligent Transportation System.

LBS Location-Based-Service.

NoSQL Non Structured Query Language.

OOP Object-Oriented Programming.

SAE Sistema de Ayuda a la Explotaci´on.

SQL Structured Query Language.

VMT Vehicle Miles Traveled.

(10)

Abstract

Public bus fleet location data has emerged in the last years as an affordable oppor- tunity for local governments to monitor the city traffic. However, the speed data calculated from the location of the public bus fleet tend to be affected by bus stops, Consequently, the inclination on this field is to discard the speed affected by bus stops for traffic monitoring. Several approaches have been developed to identify the bus data affected by bus stops.

In this work, the effect on traffic monitoring of bus location data affected by a bus stop is tested through a case study in La Castellana, one of the main arteries of Madrid -the capital city of Spain-, by using data of its public urban transport company, the Empresa Municipal de Transportes (EMT).

The analysis of the results concludes that the use of public bus fleet location data affected by bus stops has a bias effect on traffic monitoring. However, it also concludes that this bias effect is mainly caused by the buses dropping or collecting passengers.

Keywords: probe vehicle, location data, traffic monitoring, public bus fleet

Reproducibility self-assessment: 1, 1, 3, 2, 1 (input data, preprocessing, methods, computational environment, results).

The code used has been published in github/Ponsoda, as well as the data collected from the API and processed to calculate the speed of each point.

(11)

Preface

This thesis is an original work by Ignacio Ponsoda Llorens. No part of this thesis has been previously published.

(12)

Chapter 1 Introduction

The content of this thesis is based mostly on the services of public transportation systems, which had been combined with novel technologies to bring new capabilities to users and decision-makers, creating the concept of Intelligent Transportation System (ITS). In this chapter, the state of the context is analyzed in section 1.1. Then, the problem and the objectives are defined in section 1.2. Finally, the related work is commented in section 1.3 and the approach and methodology used are explained in section 1.4.

1.1 Context

Starting from the industrial revolution period, [Freeman 1983] determine that the public transportation system has become fundamental for the development of the modern cities, its society and its economy. Public transportation allows the population to travel with low fares around the city. It permits commuting and it is fundamental for local governments to expand their cities and provide basic services to the citizens without the necessity of duplicate them [Saeidizand 2015].

The main factor for the increasing importance of public transportation is the growth of the cities and its populations. Cities are expected to grow in population from 2010 to 2050 by 80%, -from 3.5 billion to 6.3 billion citizens- [Saeidizand 2015]. This growth affects urban mobility by increasing its volume and importance

(13)

for human life. New challenges on the public transportation system have arisen, in relation to achieve universal accessibility, reliable timetables and improve costumers safety.

Moreover, considering the current consciousness about climate change and the Sustainable Development Goals 2030, urban transportation systems have become a cornerstone to make cities more sustainable, through the improvement of public in- frastructures and the implementation of new technologies. Specifically, there are four sub-indicators of the Sustainable Development Goals 2030, retrieved from [Johnston 2016], that have a direct impact on the public transportation system:

• 9.1.1 “Develop quality, reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure, with a focus on affordable and equitable access for all”

• 9.1.2 “Develop quality, reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure”

• 11.2.1 “Provide access to safe, affordable, accessible and sustainable transport systems for all”

• 12.c.1 “Rationalize inefficient fossil-fuel subsidies that encourage wasteful consumption by removing market distortions, in accordance with national circumstances” [Chadil et al. 2008]

About the ITS, this concept appeared in the early 1970’s in Europe, as a manner to improve the relationship between vehicles, roads, and users. Moreover, ITS provides an intelligent integration between the different components of the urban network to reduce emissions and accomplish efficient energy consumption [Xiong et al. 2012].

Accordingly, ITS is essential to apply novel technologies to enhance the public transportation system and provide updated information to the costumers. With ITS, [Bekhor et al. 2013] defends that, with ITS, it is possible to improve the public

(14)

transportation performance without major modifications on the architecture of the transportation system or its equipment while, as [Qi 2008] advocates, reducing energy consumption and increase efficiency.

ITS can provide, updated traffic flow data which is key at the current ”Information Era”. This information allows citizens to wisely choose their daily transportation as gives them the possibility to decide their transport method beforehand, based on the traffic updated information. According to [Qi 2008], when customers have the knowledge about the public transportation state, its reliability on public transportation and its willingness to pay for the public services increase.

Furthermore, the updated traffic information permits decision-makers to perform actions on transport management with the global image of transportation in the city [Singla and Bhatia 2016], detect traffic congestion and possible incidents to finally prevent possible traffic issues by improving the infrastructure [Dziekan and Kottenhoff 2007], changing traffic rules and improving the information customer driven [Bacon et al. 2011]. For example, [Dziekan and Kottenhoff 2007] explain that the improvement At-stop real-time displays increases consumer’s satisfaction, while reducing the waiting time by an average of 16%.

ITS permits to monitor traffic flow and get reliable traffic data information through conventional or mobile traffic data collectors [Leduc 2008]. Conventional traffic collectors are fixed and the data collected is just representative for determinate sections [Leduc 2008]. Alternatively, the Automatic Vehicle Location (AVL) collects traffic data that covers a significant portion of the road network [Berkow et al. 2008].

The real-time detailed traffic data provided by AVL is necessary to maintain safe and sustainable cities, allowing users to find optimal transport method, between public and private options [Derevitskiy et al. 2016].

In recent years, the public bus fleet start to work as a AVL by incorporating Location-Based-Service (LBS) systems to detect their position on real-time [Kamran and Haas 2007]. The bus fleet location data provide continuous traffic information for

(15)

different parts of the city [Bacon et al. 2011] while allows to control traffic fluctuations based on the historical collected by different bus lines at different circumstances [Uno et al. 2009].

It has been demonstrated in several studios that using public bus fleet AVL permits monitoring the traffic of the whole road network and to detect possible congestion based on historical data [Pu et al. 2009], [Zhu et al. 2013], [Bertini and Tantiyanugulchai 2004], [Uno et al. 2009].

1.2 Problem definition and objective

In order to face the challenges of sustainable transportation, the public transportation system has to become a reliable alternative to private vehicles within cities. To do so, the ITS systems have to bring users updated information about the state of the public services.

In the case of traffic information and the use of the public bus fleet as a probe vehicle, the data provided by LBS has to accurately monitor the traffic flow. The use of the public bus fleet Global Positioning System (GPS) data to monitor real- time traffic flow in cities has been researched in recent years. This is done without significant modifications in the bus structure, thus becoming an economical high range traffic data collector.

According to [Pu et al. 2009], the main impediment to achieve accurate traffic monitoring is the negative effect of the bus stops on the collected data speeds. However, the usage of bus public fleet to monitor the traffic in cities do not have a consensus about the identification of data affected by stops neither about its effects in terms of bus speed variation.

Hence, the final objective of this thesis is to determine the correlation between bus stops and the bus speeds of the buses affected by those stops. Moreover, it is intended to determine which is the optimal way to identify the data affected by bus stops to achieve accurate traffic monitoring with public bus fleet location data.

(16)

1.3 Related work

In relation to traffic monitoring, there are two main beneficiaries. The first are related to the citizens, to whom the increase of traffic information is directly related to their reliability on the transportation system, as explained in subsection 1.3.1. The second are the decision-makers, to whom the traffic knowledge permits to ameliorate the public transportation behaviour and to detect possible traffic congestion, as explained in subsection 1.3.2. Different approaches about the effect of bus stops on traffic monitoring is analyzed in subsection 1.3.3.

1.3.1 Bus arrival predition

About the possibility to improve the at-stop information for public vehicles by using GPS data, [Singla and Bhatia 2016] argues that the real-time location data, combined with the historical location datasets, permits to understand public transport behavior and therefore improve the forecast of bus arrival to bus stops.

A problem for predictions based on location data is the accuracy of the GPS, which used to be inconsistent with the road network,. This problem can be aggravated if there are buildings surrounded the GPS sensor [Cao and Krumm 2009], [Weng et al. 2016]. Accordingly, to perform arrival predictions, it is necessary to assign the location data to the road network by assigning each location element to the nearest pair of coordinates of the road network.

A real-time Geographic Information System (GIS) based on GPS data is used in [Weng et al. 2016] to establish the location of bus stops by finding patterns on where the buses tend to reduce their speed. Using a map containing the bus-system information together to assign the points to the bus lines, bus stops are detected when several GPS points have 0 km/h speed at a similar location.

For its part, [Xinghao et al. 2013] predicts the arrival time of buses based on the historical data collected. Bus behaviour patterns are found over the historical data,

(17)

which combined with real-time location inputs allows to predict the arrival time of the buses to the bus stops.

1.3.2 Monitoring road traffic using GPS data

Traditional collectors usually only gather traffic data in main arterial roads and freeways, mainly due to its capabilities and characteristics. According to [Tan- tiyanugulchai and Bertini 2003], the 40% of the Vehicle Miles Traveled (VMT) happen in main arterial roads, where the main problem for traffic monitoring is the diverse start-destination points for the vehicles, which disturbs the traffic data collection.

Nevertheless, it is just as crucial to monitor the traffic flow in secondary roads, as they represents the 60% of the VMT [Tantiyanugulchai and Bertini 2003].

An alternative method for data collection was proposed in [Berkow et al. 2008] to cover a substantial part of the road network by using pubic buses as probe vehicles.

According to [Derevitskiy et al. 2016], city traffic can be monitored by extending the historical bus traffic conditions to road sections with a lack of bus location data.

To use probe vehicles as traffic collectors, [Xiaohui et al. 2006] uses GPS real-time data combined from taxis and buses to determine the traffic flow in the city. First, map matching is performed to assign the GPS data to the pertinent road sections.

Then, the useless data is removed based on time and distance parameters. Lastly, the mean speed for the vehicles assigned to each road section is calculated.

The result is a real-time traffic flow map, which allows the comparison of current traffic flow with historical data. [Pu et al. 2009] compare the data collected from a public bus in its regular route with a probe vehicle data specifically used for collection proposes, to determine the quality of the use of bus location data to monitor the city traffic.

In the case of [Jurewicz et al. 2017], the speed calculated from bus GPS data is assigned to the closest road segment. The speed is calculated by using two beforehand known points covered by a Floating Car Data (FCD). The speed resultant is validated

(18)

by comparing it with the speed gathered from the static collectors.

Identify patterns with the traffic data

The major difficulties to detect traffic incidents with public bus fleet location are based on the different behaviour the buses have compared with private vehicles. In [Bekhor et al. 2013], traffic conditions and at-stop delays are collected by using public bus fleet as probe vehicles. [Berkow et al. 2008] define an algorithm to automatically determine the congestion level based on bus directions.

Traffic patterns can be identify based on the GPS data for massive congestion and traffic incidents, as [Kamran and Haas 2007] develops in their work. GPS-based accident detection is done per road network segments by comparing the real-time speed with historical speeds in similar conditions.

With a similar approach, [Akulakrishna et al. 2014] storage in cells real-time traffic data from public bus fleet. The most close is the cell to the location of the point, the more accurate is the traffic information storage in the cell. As the traffic data is stored in those cells, it is possible to achieve more efficiently queries. It permits a quick congestion detection and determine possible alternative routes to avoid them.

Likewise, [Zhu et al. 2013] use the bus probe data to estimate road traffic conditions based on speed variations.

In [Bacon et al. 2011] is established a journey time estimation based on bus location data. Moreover, this bus location information permits the detection incidents during their routes and predicts possible road congestion after incidents.

1.3.3 Bus stops effect in traffic monitoring

The use of public bus fleet as probe vehicles to monitor the traffic of cities has to face an issue related with the bus behaviour. According to the papers listed in Table 1.1 the bus stops have a negative effect over the speed of the buses, which directly have repercussions on the use of speed calculated from location bus data to monitor the

(19)

traffic of the road network.

Based on this negative relation different approaches had been developed to identify the bus data affected, and discard it for traffic monitoring. [Berkow et al. 2008] detail a subtraction method to detect buses dropping or collecting passengers based on its direction and its distance to the bus stop. A 50 meters buffer for the bus stop is established, wherewith, if the bus goes to the bus stop and the bus is located inside the buffer, the bus arrival time to the bus stop is estimated. For [Xiaohui et al. 2006], a bus is dropping or collecting passengers when the data collected remains in the same location for more than two consecutive minutes.

[Uno et al. 2009] establish a number of conditions to determine when a bus is dropping or collecting passengers in a bus stop. This conditions are related with the speed of the bus, that has to be less than 3 km/h, and its distance to the bus stop, that has to be less than 23.7 meters. If both conditions are true, its considered that the bus is stopping at a bus stop and its speed is not included for traffic monitoring.

All reviewed academic publications related to the use of bus data affected by bus stop are listed in Table 1.1, in chronological order. According to the literature reviewed, there is a consensus about the negative effect on traffic monitoring for the use of speed coming from stopping buses, but there is not a consensus about the exact effect and how to avoid it while using public bus fleet as probe vehicles.

Table 1.1: Publications regarding approaches about the consideration of bus location data affected by bus stop.

Paper Bus Data Affected by Bus Stop

[Tantiyanugulchai and Bertini 2003] At a range of 30 meters to the bus stop -just when the bus has opened the

doors-.

[Bertini and Tantiyanugulchai 2004] At a range of 30.48 meters (100 ft).

[Xiaohui et al. 2006] 2 minutes stopped at the same location.

(20)

[Berkow et al. 2008] At a range of 50 meters.

[Uno et al. 2009] At a range of 23.7 meters and with speed less than 3 km/h.

[Pu et al. 2009] At a range of 60.96 meters (200 ft).

[Derevitskiy et al. 2016] At a range of 100 meters.

[Stoll et al. 2016] At a range of 9.15 meters

[Weng et al. 2016] At a range of 50 meters

[Glick and Figliozzi 2018] At a range of 15 meters

1.4 Approach

At [Uno et al. 2009] the bus stop negative speed effect is avoided by removing the speed data from those buses that are dropping or collecting passengers. The paper consider that a bus is dropping or collecting passengers “from before start decelerate just after finish acceleration again”. Essentially, the paper determined that a bus is stopping when it was a speed of less than 3 km/h and a distance lower than 23.7 meters to a bus stop.

The proposal in the thesis is to test the speed correlation of the data affected by bus stop using the methodology proposed by [Uno et al. 2009] to detect buses stopping. The analysis is focused on the effect of bus-stop affected points that are not dropping or collecting passengers -according to [Uno et al. 2009]-. Our hypothesis defends that, even those buses that are somehow affected by bus stops but are not stopping at it, are causing a negative effect on the speed, and therefore, those points have to be disregard for traffic monitoring. Real-time traffic monitoring is out of the scope of this research.

There are some considerations to determine the validity of our approach:

• Compare the speeds mean values based on [Uno et al. 2009] buses stopping at a bus stop detection approach with the mean speed based on our buses stopping

(21)

at a bus stop detection approach.

• Use the road network sections as a homogeneous entity to assign the speed data and to perform the comparison analysis.

• Not use bus stop buffer distance to determine the impact of a bus stop over a bus.

Instead, the stops and the bus location data are assigned to its corresponded road network section.

• Speeds calculated by using the bus location data are validated with the speed dataset provided by the public transportation company of Madrid, the EMT.

1.4.1 Methodology

The first stage of the research was the data recompilation, that is represented in Figure 1.1. The data used in this thesis comes from the EMT. The spatial layers -bus lines, road network divided in sections, and bus stops- come from a geodatabase, and the bus location data, from the EMT Application Programming Interface (API) service. A MongoDB database was used during the thesis.

Before update to MongoDB the data coming from the geodatabase, it was joined using Arcpy capabilities. The objective was to enrich the road network layer and the bus stops layer with bus lines attributes. Then, the enriched layers were updated to MongoDB by using a Node.js script that adapts the spatial layers to the MongoDB schema. Find a more detailed information of this process at subsection 3.2.1

To collect the location data from the EMT API, a Node.js script is used to make periodic requests and fill the database. At the script, a schema is employed to fit the data to the requirements of the thesis analysis, before adding it to MongoDB. Find detailed information of this process in subsection 3.2.2

The second stage of the research is the processing phase, that is represented in Figure 1.2. This stage is divided into three parts: speed calculation form the GPS

(22)

Figure 1.1: Pre-processing stage methodology.

data, assignation of the road network section ID to the collected data and assignation of the bus stop ID to the collected data.

The speed calculation is done based on the [Weerapanpisit 2019] approach, that determine the distance between points based on their position over the length of the bus lines. The location data provided by the EMT contains information about the distance field covered by the bus over the bus line. Using the difference of this distance between two different points and the difference between their time stamp, the speed of each point can be calculated.

With the distance covered data form the buses, together with the bus line infor-

(23)

mation, it is possible to assign them to their respective road network segments and at the same time, determine if there are bus stops of the bus line in the road network segment where the bus was located.

The speed calculation algorithm is explained in detail in subsection 3.2.2. Here- after, the road network section ID assignation and the the bus stop ID assignation are deeply explained at Figure 3.7 and Figure 3.8 respectively.

Figure 1.2: Processing stage methodology.

The final stage of the thesis is the data exploration and data visualization, that is represented in Figure 1.3. The collected data, enriched at the second stage with the

(24)

road network section and bus stop information is analyzed using the R environment.

At this stage, a complete analysis of the results is achieved along with the creation of two shiny dashboards for results visualization. Shiny applications workflow is deeply explained at Figure 4.9.

Figure 1.3: Post-processing stage methodology.

1.5 Outline

The rest of the document is structured as follows. The chapter 2 contains a theoretical background and the main concepts needed to understand the the thesis. In chapter 3, the collected data is explained together with the different data processing steps. The chapter 4 presents an analysis of the results, the applications developed for its visualization and the discussion of the results. Finally, the chapter 5 expose the conclusions of the thesis and the remaining work on this topic.

(25)

Chapter 2 Theoretical background

This chapter introduces the theoretical background related to the traffic monitoring concepts and stages. The aim of this chapter is to better understand the capabilities and advantages of using the public bus fleet as a probe vehicle.

The chapter is divided into six sections. The section 2.1 gives a brief introduction to ITS. The section 2.2 explains the speed calculation formula for points projected on the Earth. The third section, section 2.3, explain what is map matching and why is it necessary for speed calculation, how it is done and different approaches to assign GPS data to road networks. The section 2.4, explains different procedures for traffic data collection, along with its advantages and disadvantages. section 2.5 explains the use of databases in a geospatial context. The last section of this chapter, section 2.6, is focused on the location where the analysis is tested.

2.1 ITS

ITS is a key concept on this thesis as the different parts -traffic data collection and traffic monitoring- are covered by the ITS topic.

Definition 1:“ITS refers to efforts that apply information, communication, and sensor technologies to vehicles and transportation infrastructure in order to provide real-time information for road users and transportation system operators to make better decisions. ITS aim to improve traffic safety, relieve traffic congestion, reduce

(26)

air pollution, increase energy efficiency, and improve homeland security.” Traffic Flow Theory (2016)

2.2 Speed calculation

The general equation (2.1) of the speed calculation is the division between distance and time.

s = d/t (2.1)

s represents speed , d represents distance and t represents time

To calculate the distance on the Earth, it is necessary to consider the shperics of its shape, and therefore, the distance between two coordinates is determinate by the shortest curve between them along the surface of the Earth [Singla and Bhatia 2016].

Thus, to determine the distance between two collected points, it is used the Haversine equation (2.2) , which is combined with the timestamp difference to calculate the speed.

a = sin²(∆ϕ/2) + cosϕ1 cosϕ2 sin²(∆λ/2) (2.2)

c = 2 atan2(√ a,√︁

(1a)) d = R c

ϕ represents latitude, λ represents longitude, R represents earth^′s radius (mean radius = 6, 371km) Source: http://www.movable-type.co.uk/

(27)

Based on the approach that [Weerapanpisit 2019] did at the Geomundus 2019 conference about the calculation of at-stop bus arrivals, for vehicles with a predefined path -which is beforehand projected on the Earth- it is possible to use the length of the path to calculate the speed. Establishing the data location along the path, it is possible to calculate the distance between two points, and then calculate the speed from that distance.

2.3 Map matching

Definition 2: “Map matching denotes a procedure that assigns geographical objects to locations on a digital map. The most typical geographical objects are point po- sitions obtained from a positioning system, often a GPS receiver.” Encyclopedia of Database Systems (2009)

Map matching has many different approaches on research. [Singla and Bhatia 2016]

uses the closest coordinate point of the road network to the collected point coordinate to associate it within a specific segment. [Xiaohui et al. 2006] extracts the origin and destination from the buses location data to get the direction of the vehicles. Based on those directions, a road assignation is performed within a maximum buffer of 15 meters, selecting the road section with the lower vertical distance to the collected point. Meanwhile, [Weng et al. 2016] assign the collected data to the road network with the minimal vertical distance, using the bus direction factor to increase the accuracy.

[Zhou et al. 2016] assigns the collected data to the road network when this is inside a determinate buffer, selecting the segment with the lower Euclidean distance to the point. A potential problem with this approach, especially when a complex road network is used, is the chance of assign the GPS points to the wrong road network segment due to the low accuracy of the GPS, as explain [W.Y. Ochieng and Noland 2003].

(28)

Based on [Weerapanpisit 2019] methodology, to avoid the GPS problems related with signal and accuracy and taking advantage of the data contained by the points collected, which will be deeply explained in section 3.3, it is possible to determine the exact road network segment where the bus is located, reducing possible errors.

2.4 Traffic data collection

Definition 3: “Collection of traffic data by means of manual turning counts, place- ment of automatic traffic recorders and/or intelligent traffic cameras, review of historical published data, and in-person surveys. Many forms of data can be collected including traffic volumes, travel speeds, vehicle classification, origin/destination, pedes- trian/bicycle volumes, etc.” Dynamic 2016.

A key for a competent ITS is a reliable and updated source of data [Leduc 2008].

To collect speed data, there are two principal possibilities; the traditional “in situ”

collectors and the mobile collectors. They are not exclusive among them so both can be used in parallel to achieve more accurate data [Leduc 2008].

2.4.1 Traditional “In-situ” collection

According to [Leduc 2008] there are seven types of spot-speed data sources, listed at Table 2.1.

Table 2.1: Types of traditional traffic collectors according to [Leduc 2008]

Type Collection Method

Pneumatic road tubes Measure the pressure changes that a vehicle provokes when it passing inside

the tube

Piezoelectric sensors Detect the changes by converting it to electrical charges

Manual counts Trained observer measure the transit

(29)

Passive and active infra-red Detected based on the infrared energy from detection area

Passive magnetic Count the speed by using two sensors Microwave radar Speed data about objects at distance Video image detection Different video techniques

The main problem with traditional collectors is about its coverage range. The amount of collected data directly depends on the the number of traffic detectors installed. Therefore this type of collectors cannot cover all the road network due to the cost of its installation and maintenance [Xiaohui et al. 2006] ,[Bekhor et al. 2013], [Bacon et al. 2011].

As a result of this limitation, the data collected usually contain specific traffic conditions from the road segment where the collectors are located, and this data cannot be representative for the whole road network [Derevitskiy et al. 2016], [Jurewicz et al.

2017].

2.4.2 Probe vehicles

Definition 3: ”Probe vehicles are one of the most effective methods for collecting road traffic data because of their wide coverage area over time and space. In particular, global positioning system (GPS)-equipped probe vehicles that report their position and speed are commonly used at present.” [Seo and Kusakabe 2015]

Traffic has different flows inside and outside cities, and even so there is a distinct circulation between roads from the same city. The road traffic can vary as well depending on the day of the week, the hour of the day or by the influence of an external factor [Bacon et al. 2011]. Therefore, to achieve an optimal traffic monitoring of the cities, it is necessary to collect updated traffic data from different roads at different moments.

(30)

Probe vehicles are less expensive than traditional spots, especially in long term, and as they are not static, they cover a bigger percentage of the road network [Jurewicz et al. 2017].

Probe vehicles permits the collection of real-time traffic data. This is done by using GPS data to record the vehicle coordinates over time, and then, this coordinates are assigned to a road network segment [Leduc 2008], [Bacon et al. 2011]. Probe vehicles data are constantly updating the traffic flow, which permits the storage of historical data for different traffic scenarios. According to [Kamran and Haas 2007], the quality of the data provided from a probe vehicles has a high dependency on its integration with the map, therefore an accurate map is almost as important as the data collected.

As probe vehicles, it is possible to use vehicles already in operation or assign specific vehicles to perform the collection. The data collected for those specific vehicles is limited temporally and spatially, with no historical data, and besides, this mean the use of an extra vehicle just for traffic data collection [Tantiyanugulchai and Bertini 2003].

Although probe vehicles usually does not cover low volume roads. A positive factor for the use of buses as probe vehicles is the possibility of the use local transportation as probe elements to collect the data without the necessity of using specific vehicles [Bacon et al. 2011]. [Pu et al. 2009] demonstrated that the use of bus as a probe vehicle permits the measurement of the traffic flow in urban areas. According to [Zhou et al. 2016], the road networks are covered significantly by bus probe data, as measured in big cities like London -75%- or Singapure -79%-.

2.5 Spatial databases

Definition 4:”Spatial databases maintain space information which is appropriate for applications where there is a need to monitor the position of an object or event over space. Spatial databases describe the fundamental representation of the object of a dataset that comes from spatial or geographic entities. A spatial database supports

(31)

aspects of space and offers spatial data types in its data model and query language.”

[Samson et al. 2017].

There are two main types of databases Structured Query Language (SQL) and Non Structured Query Language (noSQL):

-SQL databases have the advantage of providing faster performance for complex queries and for establishing relations over the data. They have structured data and are highly related with Object-Oriented Programming (OOP) [Sharma and Dave 2012].

-NoSQL databases have a good performance storing any type of data because they do not need a predefined schema and are based on identification keys. Moreover, this type of database is more scalable than SQL databases, but its performance with relational data is worst than with a SQL database. [Gyor¨odi et al. 2015].

The databases with the capacity to storage spatial data maintaining its spatial characteristic is called spatial database.

Spatial databases do not exclusively store spatial data. This ”type” of database used to be an extension of non-spatial databases or capabilities inside the non-spatial database which allow managing properly spatial features. This spatial database used to be related with GIS, although is not always possible to use spatial database information directly over GIS applications [G¨uting 1994].

2.6 Case study

In order to test the approach of this paper, Madrid -Figure 2.1-, the capital city of Spain, has been selected to perform the analysis. Its population is about 3.3 million, with a metropolitan area of around 6.6 million [Instituto Nacional de Estad´ıstica 2018], leading Spain in population.

(32)

Figure 2.1: Location of Madrid.

The population of Madrid has daily commuting, usually using public transportation. The public transportation services from Madrid had become crucial with its different modalities of public transportation -bus, metro, train and a bike sharing system-, in principle managed by the public transportation company of Madrid, the EMT.

EMT is a public company constituted in 1947 to manage public transportation in Madrid, its infrastructure consists of 2.050 buses, 212 bus lines, 3.794,8 km length, and 10.515 stops, with a total of 420,2 million travelers per year (2018). Moreover, this company provides open information about the public transportation system of Madrid, including the information about bus fleet position used in this thesis.

In specific, the study is located on the arterial road of La Castellana, which supports massive traffic conditions and several bus lines are partially or totally affected by it.

(33)

Chapter 3 Data and application

In this chapter, the section 3.1 presents the database used on this thesis as well and the software and programming languages. The section 3.2 explains the joining fields procedure and of data collection procedure. The section 3.3 describes analyses the data collected and the spatial layers provided by the EMT. Then, section 3.3 analysis the data collected. Finally, section 3.4 calculate the speed from the location data and assign the road network sections and the bus stops to the points.

3.1 System architecture

This section explains the database in subsection 3.1.1 as well as the software and the different programming languages used in this thesis in subsection 3.1.2.

3.1.1 Database

A noSQL database MongoDB has been used for this thesis, together with the GUI Mongo Compass to perform data analysis, specially in the first stages of the project.

The bus location data is provided by the EMT of Madrid, toward their rest API service. With a get request with user information a access token is received. This access token gives access to different data about different services of the EMT, along with an portal where their open services are published.

Specifically, the collection service used in this project permits to collect the loca-

(34)

tion of the buses of Madrid -in point spatial data type- in real-time, with a unfixed temporal resolution which varies between 14 and 18 seconds, by using the collection ID ff594c7a-8a7c-423a-8a06-c14a4fac5bff. The spatial attributes are saved by using the 2dsphere index of MongoDB. For further information about EMT collection service, there is a complete reference in their gitlab profile.

The location data collected had two controls: firstly, the data had to match with the API request filter, which filter per bus line, and secondly, each point collected has assigned a unique index to avoid overlaps, the collection workflow is deeply explained in subsection 3.2.2.

3.1.2 Software

During the development of the thesis, depending on the requirements different programming languages were used, as its explained at subsection 1.4.1.

To join the EMT data about bus lines, road network and bus stops, python programming language was used. This process is deeply explained in subsection 3.2.1.

To perform the data collection from the EMT API as well as for performing the speed calculation and the assignation of the collected data with road network sections and bus stops, different packages of Node.js had been used. Those packages are listed in Table 3.1.

Table 3.1: List of Node.js packages used during the thesis

Package Use

Axios Perform the API requests.

Fs File System control of Nodejs.

Mongodb Use the Mongo client.

Mongoose Create a schema to store the

information from the API.

(35)

Mongoose-GeoJSON Capability to store geographic data with the mongoose schema.

Node-jq Transform geojson files to Mongo

spatial JSON files.

Node-tictoc Control the time of the scripts running.

Finally, the data exploration and data visualization were done by using R version 3.6.2. Apart from the packages included by default in R, the packages used are listed in Table 3.2:

Table 3.2: List of R packages used during the thesis

Package Use

DataExplorer Simple Data Exploration.

Dplyr Data Manipulation

Ggplot2 Graphs with the results.

Ggspatial Graphs with the results in a spatial context.

Leaflet Results visualization over maps

Leaflet.extras Extra capabilities related with the Leaflet package Mongolite Connection with MongoDB.

Plotly Dynamic graphs.

Lubridate Control data related with dates.

Rgdal

Import spatial data.

Rgeos

Rnaturalearthdata

Location Maps Rnaturalearth

Shiny Dinamic data visualization.

Shinydashboard Dashboard creation with the Shiny package capabilities.

(36)

3.2 Pre-processing data

In this section, the different phases of the pre-processing data are developed. Firstly, the logic followed to join the different spatial fields in subsection 3.2.1, and secondly, the collection algorithm is explained in subsection 3.2.2.

3.2.1 Joining layers

The first step of the pre-processing part reflected in Figure 3.1 is to enrich the bus stops layer as well as the road network sections layer with the bus lines information to later relate them with the collected points. For this step, an Esri geodatabase provided by EMT was used.

The Esri geodatabase was composed of a layer of the road network divided in sections, a layer with bus stops information and a layer with bus lines data. These layers were joined by using Arcpy package to get two new layers with combined information.

On one hand, the road sections network layer was joined with the bus lines layer to add to the first the information of bus lines -Line ID, direction and shape length-.

On the other hand, the bus lines layer was join with the bus stops layer, in order to determine with bus lines have assigned the bus stop.

To add the spatial layers coming from the join step to MongoDB, a script with Node.js has been employed. The script use Jq play to adapt the layers format to the spatial format that MongoDB admit, as explained in section 3.1.

3.2.2 Data collection

The second step of pre-processing was the data collection from the API of the EMT by using Node.js-section 3.1-, as reflected in Figure 3.2. To achieve the data collection, Axios package was used to request the access token and to collect the data from the API. After the API request, the Mongoose package was used to establish a schema to organise the data while filling the database, with a unique index.

(37)

Figure 3.1: Diagram of the layers join step.

The collection algorithm was divided in two parts. In the first part, a get request was perform to adquire the access token. In the second part, a post request was made for collect the selected data. To filter the bus lines selected for the analysis, a JSON sentence was used -section 3.3-.

Figure 3.2: Diagram about the collection workflow.

To perform the analysis, a total of 11 representative bus lines were selected at the arterial of La Castellana: 5, 7, 12, 14, 16, 27, 40, 45, 126, 147, 150 which 1068

(38)

different buses Figure 3.3. The buses selected runs from 5:35 AM to 11:12 PM and from 7:00 on Sundays, with different frequencies. For further information about the buses schedules, visit the EMT section dedicated for the buses.

Figure 3.3: Bus lines used for the Use Case

The public bus location data has not been stored by the EMT for any purpose in recent years, therefore there is no historical data record to work with. For this reason,

(39)

it was necessary to collect the location data to fill the MongoDB. Bus location data was collected for seven days, from 10 December 2019 at 9:00 AM to 17 December 2019 at 9:00 AM. The total amount of points collected was 942680.

Based on a preliminary analysis about the data characteristics, it was detected that the bus location data has not fixed update timing, fluctuating between 14 and 18 seconds. As a consequence, it is possible to get duplicate points. In order to avoid data duplication, a unique value is established by combining two attribute fields. A new field called unique ID was created from the combination of the bus ID and the collection date field.

For the speed algorithm code used in this section, see Appendix C.1.

3.3 Analysis of the data collected

Bus data collected : Point spatial type with bus information. The attributes of the bus data collected are listed in Table 3.3.

Table 3.3: Data collected fields explanation

Fields Explanation

Stop ID ID of the following stop station. It

changes after overtake the bus stops to the next Stop ID in the route.

Meters Number of meters covered by the bus

on its route. The meters are calculated by an odometer and the position over the bus line is checked each time that the Stop ID change. The meters value

should not differ more than 15% with the GPS covered distance information or the meters field is reestablished

based on the GPS data value.

(40)

Coordinates Coordinates in UTM and WGS. UTM coordinates come from WGS coordinates comberted by the Sistema

de Ayuda a la Explotaci´on (SAE).

Date Value in milliseconds of the date GPS

data point was collected.

Direction Direction of the bus. It can be

direction 1 or 2 depending on the route of the bus.

Status Status of the bus at the moment of the

data collection. The status when the bus is working is 5.

Bus ID of the bus.

Geometry Spatial type and coordinates in

geoJSON format.

Line ID ID of the line.

Unique ID Combination of the date field and the

bus ID field.

Time Field is got by converting the date field

to a human-readable.

The spatial distribution of the data in mainly located in the arterial of La Castel- lana, but some sections of the bus lines routes are around arterial road, as reflected in Figure 3.4. It is reflected in Figure 3.5 that the data hours distribution is between 6 AM and 1 AM, with a maximum around 11 AM.

Road Network : Line spatial type divided by sections. The attributes of the bus data collected is listed in Table 3.4.

Table 3.4: Road network fields explanation

Fields Explanation

Section ID ID of the road section.

(41)

Line ID ID of the line.

Direction Direction of the bus. It can be

direction 1 or 2 depending on the route of the bus.

Shape length Length of the road section projected.

Cumulative length Cumulative length calculated from the beginning of the bus line.

Stops: The bus stops is a line spatial type. The attributes of the data are listed in Table 3.5.

Table 3.5: Bus stops fields explanation Fields Explanation

Section ID ID of the road section where the bus stop is located.

Stop ID ID of the bus stop.

3.3.1 Data location procedure

The GPS equipped in the bus fleet of Madrid is a model U-blox NEO-M8L, which support GPS/QZSS, GLONASS, BeiDou and Galileo. See appendix for attached precision test granted by the EMT Anex A.1.

As commented in [Weng et al. 2016], the GPS accuracy can face problems inside cities due to the signal interference produced per surrounding buildings.

Taking advantage of the information contained by the data collected, specifically the meters field -subsection 3.2.2-, and based on the approach of [Weerapanpisit 2019] it is viable to calculate the speed without dependence on GPS accuracy. It is feasible to use the fields of length and cumulative length from the road network layer combined with the meters field from the points collected to determine the distance between points.

(42)

Figure 3.4: Heatmap of the data collected.

Figure 3.5: Density of points per hour.

3.4 Processing data

During the processing phase, the speed is calculated taking into account specific conditions explained in detail in subsection 3.4.1. After the speed is calculated, the

(43)

points are assigned to the road network sections in subsection 3.4.2, and finally, the bus stops affecting a point are assigned to it in section 3.4.2.

3.4.1 Speed algorithm

The first step of the processing phase was the speed calculation, which is reflected in Figure 3.6. Before calculating the speed of the buses, it was necessary to establish specific conditions to avoid biased speed values.

The first condition was not to assign speed to the first bus ID point, as it is not possible to calculate speed with a single point. The second condition was to limitate the maximum speed when a bus changes the destination stop. This is due to the fact that, according to EMT, when the bus stop destination change, the meters field of the point should not differ more than 15% with the GPS covered distance information. If it happened, the meters field is reestablished based on the GPS meters line covered information. As the speed calculation is based on the meters field, each time that it is recalculated, the algorithm checks if the speed is greater than 50 km/m -maximum speed allowed in La Castellana-, it is discarted.

The third condition was to reject no coherent results.

The specific conditions were:

• Avoid using negative distance difference -with may represent that the bus fin- ished the route and will stop for a while.

• Discard the speed if the calculation result is NaN.

• Discard the speed if the calculation result is infinite.

• Discard the speed if the difference of time between to points was more than one minute -to control the periods with lack of data-.

For the speed algorithm code used in the thesis, see Appendix C.2.

(44)

Figure 3.6: Diagram of the speed calculation.

3.4.2 Assign road network sections to points

The second step of the processing phase is the road network sections assigned to the bus points, which is graphically explained in Figure 3.7. Before assign the bus points to the road network sections, the points without speed information were excluded as they were not significant for the study.

To assign the road network sections to the bus points, it was necessary to determine the bus line and the direction of the bus point to associate them to the road network segments with same bus line and direction.

Moreover, to determine the exact road network section where the point was located, the meters field was combined with the length and cumulative length fields from the road network sections. To assign the road network section to the bus, the meters field had to be bigger than the cumulative length but lower than the addition of the cumulative length and the length of the section.

(45)

Figure 3.7: Diagram of the sections assignment.

For the road network section assignation code used in the thesis, see Appendix C.3.

Assign bus stops to points

The third step of the processing phase was the assignation of the bus stops to the bus points, as reflected in Figure 3.8. This algorithm was designed with the bus points collected from the EMT API and the road network layer and the bus stops layer facilitated by the EMT.

Before performing the assignment, the points without speed information were excluded as they were not significant for the study. Then, the ID of road network sections assigned to the bus points in subsection 3.4.2 was related with the road network section ID where the bus stops are located. Finally, the bus line ID from the bus point was used to determine with bus stop was related which each bus line -using the bus line ID-.

For the bus stop assignation code used in the thesis, see Appendix C.4.

(46)

Figure 3.8: Diagram of the stops assignment.

(47)

Chapter 4 Results and discussion

In this chapter an analysis of the results of the thesis and the discussion about the results is carried out . It is divided into the analysis of the results in subsection 4.1.2, the speed exploration results in section 4.1.2 and the efficiency of the assignations of road network sections and bus stops to the bus points in subsection 4.1.3. Finally, this chapter contains the visualization of the results in section 4.2, along with the limitations faced during development of the thesis, in section 4.3.

4.1 Results

In this section, the results of the thesis are exposed. At subsection 4.1.1, the speeds calculated are compared with a speed facilitated by the EMT. In subsection 4.1.2 the speeds are analysed in detail. The subsection 4.1.3 analyze the results of the road network sections and bus stops assignations to the collected points, and finally, in subsection 4.1.4, the results are adapted to be dynamically visualized with shiny applications.

4.1.1 Speed validation

To determine the quality of the speed calculated, the data is validated by using the speed information facilitated by the SAE of EMT. This speed contains values of all bus lines, although there is no discrimination per segment or time slots.

(48)

To perform the speed validation, the mean speed difference is calculated between the mean speeds per line and the speed data from EMT, and the thesis speeds. The results are listed in Table 4.1, to calculate the mean speed difference, which is 2.68 km/h.

Table 4.1: Speeds validation per line.

Line ID Points Speeds

(km/h)

EMT speeds (km/h)

Difference (km/h)

5 9.27 13.43 4.16

7 11.47 13.76 2.29

12 9.84 12.54 2.7

14 12.59 13.93 1.34

16 11.02 12.63 1.61

27 11.35 14.25 2.9

40 10.86 14.65 3.79

45 11.17 12.93 1.76

126 11.5 14.62 3.12

147 11.17 13.62 2.45

150 12.31 15.74 3.43

Mean 11.14 13.82 2.68

4.1.2 Analysis of the speed results

In this section the results achieved at section 3.4 are analyzed. Firstly, the speeds resultants from the thesis are validated in Table 4.1 and explored in section 4.1.2, and secondly, the road network sections assignation -subsection 4.1.3- and bus stops assignation -section 4.1.3- performance are analyzed.

For the analysis of the results, firstly the GUI MongoDB Compass was used, to query the data and to perform a first exploration of the results.

(49)

As is exposed in section 4.1.2, R environment was connected with MongoDB to analyze the results of the thesis. The mean speed was analyzed along with the density of bus speed per bus line. Moreover, histograms of bus speeds for [Uno et al. 2009]

approach and for the thesis approach were included as well as the correlation between bus speed and bus stops for both approaches.

For the speed algorithm code used in the thesis, see Appendix C.5.

Speed results exploration

The points collected from the EMT total to 942680. From those collected points, and based on the filters applied for the speed calculation -subsection 3.4.1-, 600980 contain speed values, which represents the 63.752% of the total number of points, as is represented in Figure 4.1.

Figure 4.1: Data exploration per attribute.

The weekly speed distribution per day of the week and line, reflected in Figure 4.2, shows a mean maximum speed during the weekends.

(50)

Figure 4.2: Speed over time per day of the week.

As reflected in Figure 4.3, the most occurrence speed is 0 km/h, and the quantity of points per speed has a linear negative relation with the speed. Moreover, the Figure 4.4, reflects the density per bus lines to better understand the distribution of the speed and the behaviour of each line.

(51)

Figure 4.3: Speed histogram per line.

Figure 4.4: Speed density per line.

(52)

The maximum speed considered in this project was 50 km/h, as is the maximum speed allowed in La Castellana, as explained in subsection 3.4.1. Based on this consideration, 187 outliers where detected, and they are reflected in Figure 4.5. Based on the total number of points with speed values -600980-, the outliers represents the 0.031%.

Figure 4.5: Speed outliers.

On the other side, the collected points have a total of 130903 points with speed equal to 0 km/h, which represents the 21.781% of the points with speed values - 600980-.

4.1.3 Analysis of the road network section assignation

In total, there are 130 points of 600980 with speed information that were not assigned to any road network section, which represents the 0.021%. The reason why this points were not assigned is that some collected points contain a negative value in the meters fields, that is key in the road network section assignation, as explained in

(53)

subsection 3.4.2. Moreover, it is necessary to consider that the configuration of the bus lines is modify daily and the network used in this project was fixed, so possible mismatches with the distance calculation were assumed.

Analysis of the bus stop assignation

To test the thesis hypothesis, it was necessary to assign the bus stops to the buses that, based on the EMT data, where affected by them -according to our hypothesis, this happend when the points and the bus stops are located at the same road network segment and had the same bus stop ID-.

Figure 4.6: Speed of points with stop assigned.

Analyzing the number of points affected by bus stops, there were 250246 points with bus stop assigned, the speeds of which is reflected in Figure 4.6. The buses with bus stop assigned represents the 41.639% of the total of points that contains speed values -600980-.

A total of 72319 points with speed equals to 0 km/h were assigned to a bus stop.

This represents the 55.246% of all points with speed equal 0 km/h -130903-, and a

(54)

28.903% for all the points with bus stop assigned -250208-.

Based on the [Uno et al. 2009] approach to detect buses stopping at a bus stop and according to [Zhang et al. 2018] at-stop bus time considerations, the bus stops were assigned to the points. [Zhang et al. 2018] consider that the average time a bus is stopped at a bus stop as 12.9 seconds. As the mean speed of the whole dataset is 11.4 km/h, the speed for a bus that did not stop in its assigned bus stop should not be less than 3.215 km/h.

According to this, from the 250246 points with bus stop assigned, just 109863 are actually stopping at the bus stop -section 1.4-. The effect on the speed of the [Uno et al. 2009] approach is reflected in Figure 4.7 and Figure 4.8. Besides, the mean speeds calculated based on the different bus stop detection approaches are listed in Table 4.2.

Figure 4.7: Speed of buses affected by bus stop that not stop on it, according to [Uno et al. 2009].

(55)

Figure 4.8: Speed for buses affected by a bus stop but not stopping on it.

Table 4.2: Mean speed per lines with different approaches Line ID Affected

stop no stopping

Affected Stop

Not Affected Stop [Uno et al. 2009]

5 13.8 9.27 10.35 11.6

7 16.39 11.47 12.98 14

12 14.6 9.84 11.47 13.3

14 16.3 12.59 13.09 14.4

16 15.8 11.02 12.08 13.8

27 15.9 11.35 13.23 14.3

40 16.2 10.86 11.47 13.3

45 14.5 11.17 11.73 12.9

126 15.9 11.5 12.94 14

(56)

147 15.7 11.17 12.51 13.8

150 18.10 12.31 12.56 14.6

Mean 15.879 11.141 12.218 13.77

4.1.4 Visualization of the results

To visualize the results, two shiny applications in R were created, as represented in Figure 4.9. These applications allow users to filter and visualize the thesis results dynamically.

Figure 4.9: Shiny methodology.

(57)

The first shiny application compare the speed from the points conforming to [Uno et al. 2009] stops detection approach and the ones according to the thesis approach, as reflected in Figure 4.10. The speeds values can be filtered in the application by line and bus direction. Moreover, it displays the speed density, an histogram of the speed and a summary of the speed data.

Figure 4.10: Shiny application that compare the speed from the points according to [Uno et al. 2009] approach.

(58)

The second shiny application developed contains the mean speed in each section of the road network, as reflected in Figure 4.11. The speeds are represented with a color ramp -green, orange and red, where red represents low speed traffic-, which is based on the quantiles of the speed data selected.

Figure 4.11: Shiny application with the mean speed for each section of the road network.

This application monitor the traffic in La Castellana, based on the mean speed of the buses per road network section. The speed data can be filtered by date and it is possible to consider or not the points with bus stops assigned conforming to [Uno et al. 2009] approach.

4.2 Discussion

The final objective of this thesis was to analyse the effect of bus stops over the bus speed and determine if the public bus fleet can optimally monitor the traffic at cities.

In this section, the effects of bus stops for traffic monitoring are discussed.

Two different approaches have been applied in this thesis to identify buses stopping at a bus stop derived from location data. Based on those approaches, the mean speeds

(59)

resultants are compared with the speed from EMT. Table 4.3 shows that the closest mean speeds to the EMT speeds are achieved by using [Uno et al. 2009] considerations to determine the which bus stops at the bus stop.

Table 4.3: Mean speed difference per lines with different approaches comparing to EMT speeds

Line ID Affected stop no stopping

Affected Stop

Not Affected Stop Uno Approach

5 0.37 -4.16 -3.08 -1.83

7 2.64 -2.29 -0.78 0.24

12 2.06 -2.7 -1.07 0.76

14 2.37 -1.34 -0.84 0.47

16 3.17 -1.61 -0.55 1.17

27 1.65 -2.9 -1.02 0.05

40 1.55 -3.79 -3.18 -1.35

45 1.57 -1.76 -1.2 -0.03

126 1.28 -3.12 -1.68 -0.62

147 2.08 -2.45 -1.11 0.18

150 2.36 -3.43 -3.18 -1.13

Mean 2.05 -2.68 -1.60 -0.19

The first approach considered that, when a bus has the same bus line ID and road network section ID than a bus stop, the bus is always stopping at the bus stop. In this case, the correlation between speed and the existence of stops was -0.0302431.

This correlation determine that the points with a bus stop assigned with this formula had more chances to have lower speeds than the ones without bus stop assigned.

Find the complete results at Anex B.1.

The second approach consider assigned bus stops to those points that according

(60)

to [Uno et al. 2009] are stopping at a bus stop, as explained in section 4.1.3. In this case, the correlation between speed and bus stops was -0.649483. This correlation determine that the points with a bus stop assigned had more chances to have a lower speed than the ones without it.

Find the complete results at AnexB.2.

Therefore, the lowest correlation between bus stops and speed is achieved by adapt- ing the [Uno et al. 2009] approach to detect buses stopping at a bus stop, and meanwhile. This approach produces the highest effect of bus stops on the bus speed.

4.3 Limitations

During the project implementation, some external and internal limitations were faced which restricted, in a way, the final results.

An external limitation was the performance of the API which provides the buses location data. The API had problems with its server, which made it difficult to collect data until December. Further, the fact that there was not historical bus location data available to validate the data collection results, and to forecast the speed calculation results force us to seeking alternatives. Moreover, the source of the data provided to perform the speed validation had not speed information per segment, which decrease the possibilities to further evaluate the results.

One of the main limitations confronted was related to data management. The size of the dataset combined with a lack of knowledge about big data tools increased the times for perform the analysis and led to problems with the optimization process.

Another limitation was related to the nature of MongoDB. A noSQL database was used in the project due to the first perspective which was to use the database exclusively as a data container, and MongoDB was the tool available at the moment that best suits this need. During the course of the thesis, the requirements to achieve the thesis objectives proved that the use of a relational database would be more accurate.

(61)

For the analysis of the results, the use of a fixed road network, instead of the daily updated one, leads to a situation where some meters field of the collected data did not match the road network length in which was based the speed algorithm - subsection 3.4.1-.

(62)

Chapter 5 Conclusion

This thesis presents an analysis of the effect of bus stops over the buses speed on the usage of public bus fleet as probe vehicles, based on the [Uno et al. 2009] proposal to detect stopping buses according to their distance to the bus stop and its speed. The thesis hypothesis has been tested in the arterial of La Castellana in the capital city of Spain, Madrid. The advantages of not considering the bus speed of buses stopping at bus stops to monitoring the traffic had been demonstrated.

To determine which data was affected by a bus stop, distinct phases were accom- plished. Firstly, data collection was performed and the bus speed was calculated by taking advantage of the characteristics of the data, in combination with the bus lines information. Then, the collected data was enriched with information about bus stops and road networks sections related with the data location, direction and bus line ID.

The results of the thesis show that the influence of the bus stops on the speed was higher when [Uno et al. 2009] approach to detect points affected by bus stops was considered. In the case of the thesis approach, the influence of the stops on the speed was lower. Therefore, by removing the points considered with [Uno et al. 2009]

approach as buses affected by bus stops, the speed values coming from the public bus fleet would be more appropriate for traffic monitoring. Results are believed to have a practical interest for professionals interested in using a bus fleet as a reliable probe vehicle data source. The main contribution of this thesis is to determine the

Analysis of the effect of bus stops on the bus speed regarding the usage of public bus fleet as probe vehicles

Acknowledgements

Table of Contents

List of Tables

List of Figures

Abbreviations

Abstract

Preface

Chapter 1 Introduction

1.1 Context

1.2 Problem definition and objective

1.3 Related work

1.3.1 Bus arrival predition

1.3.2 Monitoring road traffic using GPS data

1.3.3 Bus stops effect in traffic monitoring

1.4 Approach

1.4.1 Methodology

1.5 Outline

Chapter 2

Theoretical background

2.1 ITS

2.2 Speed calculation

2.3 Map matching

2.4 Traffic data collection

2.4.1 Traditional “In-situ” collection

2.4.2 Probe vehicles

2.5 Spatial databases

2.6 Case study

Chapter 3

Data and application

3.1 System architecture

3.1.1 Database

3.1.2 Software

3.2 Pre-processing data

3.2.1 Joining layers

3.2.2 Data collection

3.3 Analysis of the data collected

3.3.1 Data location procedure

3.4 Processing data

3.4.1 Speed algorithm

3.4.2 Assign road network sections to points

Chapter 4

Results and discussion

4.1 Results

4.1.1 Speed validation

4.1.2 Analysis of the speed results

4.1.3 Analysis of the road network section assignation

4.1.4 Visualization of the results

4.2 Discussion

4.3 Limitations

Chapter 5 Conclusion