0
Septiembre 2019
November 2019
1
1. I
NTRODUCTION4
2. D
ATASETS RELATED TO CITIES IN REAL-
TIME9
2.1. Public Car Parks Dataset 11
2.2. Public Bicycles Dataset 15
2.3. Traffic Dataset 18
2.4. Public Transport Dataset 21
2.5. Air Quality Dataset 24
2.6. Noise Pollution Dataset 27
3. C
ONCLUSIONS30
4. R
EFERENCES32
2
Content prepared by Jose Luis Marín, expert in Digital Transformation and
open data.
This study has been developed within the framework of the Aporta Initiative, developed by the Ministry of Economy and Business, through the Public Business Entity Red.es, and in collaboration with the Ministry of Territorial Policy and Public Function. The contents and points of view reflected in this publication are the sole responsibility of the author. The Aporta team does not guarantee the accuracy of the data included in the study. The use of this document implies the express and full acceptance of the general reuse conditions referred to in the legal notice shown at:
http://datos.gob.es/es/aviso-legal
3
Objective:
The objective of this report is to stimulate the publication of datasets in the cities that can be reused in real-time. To this end, a set of national and international applications and initiatives in this field have been compiled and described, analyzing their potential impact on citizens and the challenges that their deployment represents, in addition to the technologies that enable them.
For the preparation of this report, we have used the datasets suggested to be published in real-time by the Spanish Federation of Municipalities and Provinces (FEMP) in its report “Open Data 2019 - 40 datasets to be published by the Local Entities".
This publication constitutes one of the few references in the world where the most relevant datasets are systematically collected and analyzed in the context of the local entity competencies.
Structure description:
The first part of the report introduces the current challenges facing cities and how they can be addressed with the use of open datasets in real-time.
The second part presents a selection of use cases related to real-time datasets in
the fields of transport and mobility, and the environment.
4
1. I NTRODUCTION
According to the United Nations, around half of humanity already develops its life in cities and in the coming years this proportion will not stop growing. Some estimates place 70% of the world's population residing in urban areas in 2050. This situation necessitates a change in the conception of cities.
The need for sustainable development of cities makes Smart Cities - and with them open data - become one of the main responses to the great challenges posed by this trend. Smart city projects and initiatives, in any of their expressions, have considered open data as one of their most valuable assets and a fundamental part of their service development and infrastructure deployment strategies for more than a decade.
The benefits of opening data on cities
The European Committee of the Regions already recognized in 2012 the importance of open data at regional and local level due to its potential to become valuable assets for citizens, businesses and public authorities. Among the main advantages of the exploitation of open data from cities are the potential improvements in urban mobility, energy efficiency or environmental conditions.
The applications in each of the different areas are numerous and, as evidenced by the different
impact assessments that are published, these applications do not stop growing as a result of
the virtuous circle that begins with the publication of data. In addition, we increasingly have a
better understanding of the contribution that open data can make to solve the different
problems that cities face in diverse fields.
5
Take as an example the case of the city of London and the opening of transport data:
• Considering only the benefits for citizens, the opening of real-time data on public transport has a direct impact on economic and time savings, while increasing their quality of life. People can plan their routes more efficiently, avoid traffic jams and setbacks and integrate alternative models of moving around the city, such as cycling or walking.
• The second consequence of this efficiency is in the impact that is achieved on the environment and on the health of citizens, since almost 50% of the emissions in cities come from transport.
As we can see, the opening of real-time data has a positive impact on different areas and facets at the same time.
Throughout the report we will see more examples of how environmental improvements have a
high impact on public health and, consequently, on the quality of life of citizens. For example,
people with some type of heart or respiratory condition can plan better their time outdoors. The
availability of public and accessible information allows citizens to be aware of existing pollution,
which favours a participatory and critical attitude towards public policies in this area and an
individual behaviour that is more respectful of the environment.
6
Open city data
Currently, the management of open data in cities is much more complex than a few years ago. It is not enough to publish datasets, but cities are also required to address issues such as licensing standards, publication formats, ontologies and vocabularies for each domain, interoperability with other stakeholders, data publication platforms, data collection strategies and service level agreements on published data. In addition, the challenges posed by real-time data publication require more sophisticated mechanisms than current APIs or data catalogues to manage events that are generated by smart city sensors with high density and frequency.
On the other hand, citizens' expectations regarding the accuracy of services and the levels of information they expect from cities are also increasing. This supposes a very positive pressure for the publication of useful open data to solve the problems that directly concern the citizens. A 2016 study related to Digital Citizens shows how the expectations of United States citizens regarding digital government services have increased by 15% since 2014, and the satisfaction with current services doubled during this same period.
We must not forget that achieving a significant impact through open data requires cooperation
between different administrations. Cities should work with other cities and with regional and
national governments to ensure that their coordinated efforts in data opening are useful to
provide more comprehensive and useful solutions.
7
In this line, it is worth highlighting the “Open Cities” project, whose main objective is the full development of Open Government policies in cities, including, of course, a decisive impulse to the publication of open data. This is a project promoted by a consortium formed by the City Councils of A Coruña, Madrid, Santiago de Compostela and Zaragoza together with Red.es, and it is aligned with the National Plan of Smart Cities. Among the actions of the project, it is worth mentioning the one that will lead to the publication of a catalogue of 11 documented vocabularies with examples of use cases and available in several representative languages.
Three of these vocabularies correspond to the datasets under analysis in this report: public bicycle, traffic and noise pollution.
The value of real-time data
Among the multitude of open datasets that are published in cities, there is a subset that has some peculiarities that, on the one hand, makes them tremendously valuable, but on the other these peculiarities makes their management and publication expensive and complex. These are the data whose greatest utility is achieved when we could access to them in real-time, that is, with a minimum delay since they are captured or generated.
In the context of open data, there is no precise definition of what “available in real-time” means, although, in general, we speak about update frequencies in the order of minutes. It is difficult to find open data publishing systems that are prepared to manage updates below five minutes in a reliable way. In most cases they are more reliable with daily, weekly or monthly updates.
The systems needed to manage datasets in real-time are much more complex than those to manage datasets where such availability is not critical. This is because if they were published with a higher delay, they could lose their main utility since they could have been replaced by another new more updated value. For example, if the sensor that checks the occupancy of a parking space provides data only on a daily basis, it is normal that this information is not useful for users who want to know if they can use that parking space at a determined time.
Sometimes real-time data is mistakenly identified with data that are captured in large quantities
and measured with very high frequencies: vehicle position data, airborne particle concentration
data, etc. Although in many cases it is so, especially if we think of sensors that capture events
8
that change very quickly, the key attribute of real-time data is its availability with a minimum delay since they are captured. And with this consideration, the concept of real- time is substantially extended, since it refers to data that although its update is daily or weekly, must be available as soon as they are generated.
In this context, the second part of the report will review the real-time datasets that cities can
use to achieve greater impact by generating new services for their citizens and visitors in the
areas of transport and mobility, and the environment.
9
2. D ATASETS RELATED TO CITIES IN REAL - TIME
In this chapter, we analyse a series of datasets whose publication in real-time is providing valuable solutions to the challenges that cities face. To select the datasets, we took as a starting point the report “Open Data 2019 - 40 datasets to be published by the Local Entities”, published by the open data group of the Network of Local Entities for Transparency and Citizen Participation of the Spanish Federation of Municipalities and Provinces (FEMP).
The document collects 40 datasets of recommended publication by local entities in their open data catalogues to encourage reuse and generate value. The document is mainly oriented to medium-large cities but its ambition is to set a knowledge framework that can be used by smaller municipalities.
Among these 40 datasets there are 6 for which a minimum refresh rate in real-time is
suggested, that is, that the data is available at the same time it is generated or with the
minimum possible delay. We look at these 6 sets because they have a special potential to
generate solutions that improve the quality of life of the population of cities.
10
Of the six datasets, four belong to the Transport category and the other two to the Environment category. Both are two of the five data domains that the European Commission has identified as having high value and greater demand from citizens and businesses and are among the most reused in the European Open Data Portal.
For each of these 6 sets of data, we include a file with the result of the analysis performed, presented with the following structure:
Datasets
1. Description
2. Stakeholders and publication standards 3. Possible real-time applications
4. Potential impact of possible applications
5. (National or international) Examples of dataset use 6. Technologies and data involved
7. Challenges
11
2.1. Public Car Parks Dataset
Description
The real-time availability of data related to the state of parking occupancy in cities opens the possibility of resolving or introducing greater efficiency in the mobility of citizens and in the environmental impact of said mobility.
In a first stage, the cities opened the location data of the public car parks and applications were developed to locate them, although in general, these first versions did not include information on the occupation of their slots. The location data of the public car parks are also integrated in all commercial navigation applications.
Smart parking solutions go one step further and aim to solve the challenge of parking a vehicle in a city by managing information in real-time, including not only the car park location but the occupancy status of each of their slots.
For this, it is necessary in many cases to have public data and others that come from sensors that are owned by the different companies that operate public car parks.
In the most advanced cases of intelligent parking spaces, it is suggested not only to work with public car parks but also to consider the problem globally, including surface parking spaces at any point in the city and its relation to other public transport means.
Stakeholders and publication
standards
The most advanced use cases arise from collaboration between cities, parking equipment providers, parking payment solution providers, car park operators, car manufacturers and even specialized providers of parking- related data in real-time.
The need to move forward in cooperation schemes between stakeholders has led to the founding of the Alliance for Parking Data Standards (APDS) formed by the International Parking and Mobility Institute (IPI) together with the British Parking Association (BPA), and the European Parking Association (EPA). The APDS has the mission of developing, promoting and maintaining a global standard that allows organizations to share parking data on platforms around the world. Version 2.0 of the generated documentation (data model, use cases and data dictionary) has been available since June 2019 and includes the draft of the standard for data domains, parking place, rate, occupation, right, session and observation.
The dataset is also included in Standard UNE 1798301: 2015 Smart Cities Open Data.
12
Possible real- time applications
• Mapping of free public parking spaces: saving time and fuel for people traveling in vehicles, and reducing emissions are some of the main benefits derived from higher efficiency in vehicle parking.
• Calculation of the parking price dynamically: with a price calculated according to the demand, cities can optimize the income they earn from the parking spaces they manage and improve traffic conditions.
• Improvement of multimodality in urban transport: optimizing the way in which users park their vehicles allows routes combined with other public transport systems (bus, subway, tram or public bicycles) or alternative mobility forms in the city (walking, bicycle, etc.)
Potential impact of possible applications
According to a study published by Nrix, the search for parking spaces is the main component of the cost of traveling in a vehicle. According to this study, the fuel and time lost when searching for a place to park had an average cost of about €1,400 for each German driver and €1,200 for each driver of the United Kingdom in 2017.
In Madrid, on average, a person loses twenty minutes each time he/she seeks for parking slots and 32% of the fines imposed are due to incorrect parking. In addition, only 27% of Madrid's residential buildings have parking and 30% of urban traffic and traffic jams are attributed to people looking for a place to park their car.
Therefore, there is a great potential impact, both economic and on the quality of life of citizens, if the time to park a vehicle once is reduced.
Additionally, all the time that vehicles circulate searching a parking space contribute to climate change. Already in 2007 Donald Shoup, author of “The High Cost of Free Parking”, calculated that the excess of kilometres due to parking searches produced in a single year more than 730 tons of CO2 just in a 15 blocks area in Los Angeles.
On a global level, the potential impact on the environment derived from reducing these emissions thanks to a more efficient search for parking is enormous.
Examples of dataset use
• Málaga: The city publishes in real-time the occupancy data of its 14 public car parks. There is a project in which not only the representation of the current occupation of each car park is offered, but it also calculates an occupancy forecast up to seven days in advance. This project, winner of the first open data contest in the city of Malaga, provides the same service using data by cities such as Zaragoza, Valencia, Glasgow or
13 Birminghan. We can find similar examples in numerous medium-sized cities and in virtually every major city in the world.
• San Francisco: The city of San Francisco has implemented a charging system for the city's parking spaces based on demand and affects the 28,000 parking spaces available on public streets and fourteen park cars operated by the city. This means that prices may fluctuate depending on the block, time zone or day of the week. Demand-based rates are aimed at getting vehicles out of the circulation to park as quickly as possible, in order to improve traffic conditions and driving experience, as well as reduce emissions. Big cities like New York or Los Angeles also have similar initiatives.
• Parkopedia: Most parking information systems cover the scope of a particular city. This service, however, adds parking data in 15,000 cities in eighty-nine countries. It offers users real-time parking information on more than seventy million parking spaces worldwide. It uses open data, including data from companies that operate parking facilities. It provides services to users through apps and navigation system providers and car companies.
Technologies and data
involved
• Occupancy sensors: The sensorization of parking spaces is one of the approaches to exactly know the occupancy rate.
• Cameras: Images captured by digital cameras provide richer information than sensors, but this information has a higher processing cost.
• Computational vision: Artificial intelligence techniques that are applied in computational vision are the basis for extracting data from images captured by cameras: license plate recognition, space occupation, etc.
• Big Data Software: Capturing, storing and processing all relevant data to provide real-time parking information requires specialized platforms capable of working with large and varied amounts of information.
• Online payment methods: Parking in public car parks is associated with the payment for the time of occupation, so the automation of payment systems is another source of efficiency.
• GPS navigation systems: The vehicle positioning data together with the parking position data allow to trace routes to the place where the free space is located.
• Other related datasets: opening hours, availability of electric chargers, parking fees, accessibility for people with reduced mobility, etc.
14
Challenges
• Autonomous cars: the imminent increase in autonomous vehicles can lead to parking becoming a less and less important problem in the future for citizens. However, it will be essential for vehicles not to make unnecessary journeys.
• Collaboration between nearby populations: many of the current solutions are restricted to the scope of a city and the user must manage different solutions for different cities. In many cases, to be truly useful, parking solutions should have a range of a higher geographical area, which encompasses nearby towns. For this, collaboration between different authorities is essential.
• Predictive systems: Solutions that only offer real-time information are not accurate enough because parking is so variable that, even in the time it takes to reach a place, it may no longer be available. In this scenario, the incorporation of predictive systems helps to improve the usefulness of these systems.
15
2.2. Public Bicycles Dataset
Description
In many cities of the world, cycling is considered as an efficient means of transport in daily mobility to reduce air pollution, traffic congestion and carbon emissions. Just in Spain, in 2018 there were 52 active systems, and more than 1600 worldwide.
Public bicycle sharing systems are not new, but thanks to the technological advances of the last decades, they have undergone a greater deployment and implementation, improving the applicable business models and the users´ experience.
In addition, the possibility of collecting and analysing real-time data about the use of the systems has allowed the resolution of some pitfalls that made them unattractive to users, such as the possibility of knowing in advance if a bicycle was available at a certain point.
In most of the cities where shared bicycle systems have been implemented, datasets with station location have been released as open data. However, cities that share real-time data related to the availability of bicycles or free points at stations are still a minority. This information is usually only available for official apps.
The most advanced cities, such as Oslo or New York, however, publish data about the paths taken by users, including even demographic data.
Stakeholders and publication
standards
In the most common use cases, cities, their public transport authorities and providers of shared bicycle systems are involved. It is also common for smart city programs to include the development of bicycle transport as part of their activities related to the improvement of intermodality and mobility in general.
The General Bikeshare Feed Specification (GBFS), promoted by NABSA, is the open data standard for publicly sharing data from shared bike systems in real-time. Since its publication, it has been adopted by more than two hundred and thirty shared bicycle or scooter systems worldwide.
Possible real- time applications
• Mapping and prediction of free slots at bicycle stations: The use and expansion of these systems is closely related to the users´ experience.
The more accurate the availability information, the more people will adopt the system.
• Calculation of intermodal itineraries: Applications that calculate the best routes can integrate the availability and position data of public
16 bicycle systems together with other means of public or private transport to offer the best possible alternatives, considering all the possibilities.
• Trajectory data study: the extension of the system, the investment in the construction of new bike lanes, the balancing of the stations, etc. They are greatly benefited by the analysis of system usage data. These studies are often carried out by professional researchers, but it is also common to find works by amateur citizens.
Potential impact of possible applications
According to a study conducted by IESE researchers for each euro invested in these systems there is a return of between 1.37 and 1.72 euros. They have reached this conclusion after valuing income, jobs creation, the effects on related local sectors and the increasing households demand that benefit by such job creation. Although just the economic impact does not cover the cost of systems in all cities (the return ranges between 0.79 and 1.14 euros for each euro invested), their combination with health benefits makes them fully profitable, as the final results of the report show.
The study does not calculate the potential impact on the environment due to the reduction of emissions as a result of the non-realization of the routes in other means of transport. Some studies do not consider it relevant because the users of these systems do not come from private transport, but from other means of public transport and therefore the effect on emissions would be less.
Examples of dataset use
• Hangzhou, China: The city of Hangzhou, with a population of around seven million inhabitants, has the world's largest shared bike program by far. There are between 66,500 and 78,000 bicycles, spread over around 2,700 stations. The program uses the concept of last mile to ensure that users can easily move from public transport stops to their destinations by bicycle to complete their trip in the best conditions. Data analysis has played a key role in the rapid expansion of the program to reach its impressive current size.
• Oslo and New York: For the sake of transparency, cities like Oslo and New York publish an API and historical data on how their bicycles have been used. This has resulted in numerous studies (such as this one by Todd W. Schneider or that of Jon Olav H. Eikenes) sometimes carried out by independent people who have contributed to the improvement of the systems.
• Google: Google Maps allows you to locate shared bike stations and determine how many bicycles are available in twenty-four cities in sixteen countries, including Madrid and Barcelona. It also allows to find out if there is an empty space at a station near the destination so that the user can leave their bicycle.
17
Technologies and data involved
• GPS positioning receivers: in order to send positioning coordinates to the network. Today, this information is used to identify and locate locked bicycles. It is foreseeable that shared bike companies will start using this system to collect travel data for user profiles or to offer additional services, such as on-the-road commercial promotions.
• Systems to prevent theft: There are several approaches to prevent theft of bicycles and all incorporate a variety of integrated circuits: 32-bit MCU to coordinate and manage processing, Bluetooth SoC (chip systems) and NFC tags for identification and communication, and MEMS (microelectromechanical systems) to detect manipulation. Together, these components allow users to block and leave the bicycle for the next person at the end of their journey.
• Solar panels and other electronic components: these devices allow to optimize the energy produced by the movement, for example, batteries to store this energy and power the controllers for the automatic locking for bicycles.
• Online payment methods: The use of public bicycles is associated with payment for time of use or subscription periods in which bicycles with certain restrictions can be used. The automation of payment systems is a source of efficiency.
• Other related datasets: data on the paths taken by the users, calculation of the availability forecast, etc.
Challenges
• Redistribution of bicycles: The need to redistribute bicycles between stations is a drag on the sustainability of the systems. In London, although the system is almost eight years old, TfL continues to subsidize the system with more than three million pounds per year.
In New York, a solution has been found thanks to open data that consists in encouraging users themselves to make the distribution. The "Bike Angels” program rewards people who contribute to rebalancing the system. The program was designed based on research conducted by Cornell University.
• Stationless systems (Dockless): These innovative systems are potentially more comfortable for users, as they are not obliged to start and end their journeys at the predetermined stations. However, the redistribution logistics of bicycles is more complex, there may be reliability problems and there are doubts about the financial sustainability of the business model itself.
18
2.3. Traffic Dataset
Description
Traffic data in cities is one of the most complex to manage due to the multiple dimensions that are affected. It includes not only the traffic of vehicles on the roads, but also information as varied as the locations of potholes or broken-down traffic lights.
Problems such as traffic congestion, very serious in many cities of the world, and traffic accidents, which are one of the main causes of death in many countries of the world, greatly affect the life of the population of the cities. The study of road traffic information helps, among other things, to better understand mobility problems in cities.
In general, we find that the different problems are addressed with a mixture of data that comes from the public sector along with data collected by the citizens themselves or companies that have fleets of vehicles circulating through the cities.
The data collected by taxis and other personal mobility systems such as those in the New York City FTA deserve special mention. These data have the particularity that they have the dual utility of contributing to the understanding and improvement of public transport systems, while recording data on the state of road traffic. Companies like Uber, through their “movement” initiative, also provide data and tools for cities to understand and address the challenges of urban transport more deeply.
Stakeholders and publication
standards
In the complex ecosystem of traffic data, we find many stakeholders. From the cities themselves and the equipment manufacturers to the companies of passenger transport, car rental or logistics. Even within the cities themselves there are multiple departments or agencies that contribute or use this data.
The electronic language used in Europe for the exchange of traffic data and information and is Datex II, funded in part by the European Commission.
Traffic information and traffic management information are distributed without depending on the language and presentation format.
The dataset is also included in Standard UNE 1798301: 2015 Smart Cities Open Data.
Possible real-
• Visualization of the traffic situation: The real-time visualization of the traffic situation and the events that can occur on a road is of great importance for the citizens.
19
time applications
• Studies on road transport: With the different datasets available, private companies, researchers and citizens are conducting interesting studies to understand different aspects of road traffic such as comparing the transport time between different media and traffic situations.
• Prediction of traffic accidents or traffic jams: There are numerous investigations that attempt to create predictive models on different traffic- related events, for example, to calculate the risk of an accident occurring at a specific place and time or to calculate the risk of traffic congestion.
However, these problems are affected by numerous factors and vary according to spatial and temporal environments, which makes modelling and prediction difficult.
Potential impact of possible applications
Traffic congestion is an increasingly important problem for many metropolitan areas. For example, the World Bank estimated at 60 million dollars the daily cost for the economy of the traffic jams that occur in the city of Manila. The INRIX company calculated in 2018 that the city of Los Angeles loses more than 9,000 million dollars a year. With these figures any small improvement in urban planning decisions or the way in which citizens organize their transport has a great return, both economically and in improving the quality of people life.
Car accidents are another big problem related to road traffic. According to ASIRT, almost 1.3 million people die in the world every year due to car accidents and up to 50 million people are injured. Just in Europe, statistics indicate that 50 road deaths are recorded every minute (data from 2017).
Every accident that is avoided has a great return to society.
Examples of dataset use
• Sweden Transport Administration: Collect traffic information 24 hours a day, all year round and provide the free information to those who wish to subscribe to create their own services. All traffic information is in XML format through a European Datex II standard.
• Open Transport Partnership: Three passenger transport companies that together cover more than thirty countries and millions of customers, are working with the World Bank and other partners to make the traffic data collected by their GPS transmissions available to the public. Drivers through an open data license. The goal is to help transport agencies in countries with limited resources to make better decisions based on data that were previously out of reach.
• Here: The company offers products for drivers that analyse, predict and map traffic in real-time. The data are aggregated from multiple sources, including high-quality vehicle sensor data, historical traffic records and open data from government sources.
20
Technologies and data
involved
• Traffic data acquisition systems: There is a wide variety of technologies that are used to acquire data, including: magnetic turns, piezoelectric fiber or magnetic sensors, microwave radars, acoustic sensors, ultrasonic sensors, etc. Directive 2010/40/EU on intelligent transport systems provides the frame of reference to use these systems information in the European Union.
• Cameras: Cameras with a traffic sensor are used for different purposes such as measuring traffic flow and determining traffic light time dynamically and monitoring traffic code violations.
• Drones: It is increasingly common to complement traffic surveillance systems with drones equipped with high resolution cameras for monitoring traffic status.
• Other related datasets: traffic accident location data, road status location data, radar location and surveillance cameras, weather data, etc.
Challenges
• Lower cost of collecting information: Traditional methods of collecting traffic data depend on field work (labour-intensive) or sensor data networks (capital intensive). The first is slow and results in poor quality data, and the second requires substantial capital and maintenance expenditures, while only covering a small portion of a metropolitan area. Processing and consolidating data that comes from heterogeneous acquisition systems also has a high cost.
• Improve communications networks: The large extent of many metropolitan areas makes economically unfeasible to communicate all the elements of traffic monitoring by fibre optic. The connectivity that comes with a fibre connection allows the traffic signal maintenance team to make almost real-time changes to the signalling elements from a traffic control centre. Where this type of connectivity does not exist, the deployment of these changes is much slower. The arrival of 5G networks will contribute to the improvement of all these systems that, in many cases, even depend on the citizens to report the failures in the systems.
• Protect the privacy of citizens: The large amount of data and images collected on the mobility of people presents significant ethical and privacy protection challenges. Even when the data that collect the increasing amount of traffic information acquisition systems are perfectly anonymized, they must be subject to important information security measures.
21
2.4. Public Transport Dataset
Description
Transport data is probably the most requested type of open data by entrepreneurs, activists and citizens. In the last decade, real-time public transport data has become an essential part of urban life. From the screens with information on the arrival of trains or buses, to the applications for smartphones that help citizens plan journeys, open data is contributing to numerous solutions that improve urban mobility.
It is not just publishing a single dataset, but also implementing complex and expensive strategies that cover up to 80 datasets (more than 75% through APIs) in the case of the city of London. These strategies include operational and corporate information on all means of transportation, schedules, service status and interruption information.
The main benefits of publishing these datasets, collected in numerous studies, include shorter waiting times (they use an application to optimize their travel to a stop or station), a shorter travel time (citizens adjust their itineraries travel) and greater use of public transport.
In the long term, the impact is expected to be even greater with the vision of a future where the real-time integrated data of all transport options allow a true urban mobility system that is more comfortable than the use of private cars.
In the long term, the impact is expected to be even greater with the vision of a future where the real-time integrated data of all transport options allow a true urban mobility system that is more comfortable than the use of private cars.
Stakeholders and publication
standards
The stakeholder ecosystem is broad and includes public transport authorities, service providers related to the passenger transport process (planning, operation and information), software product providers of different processes and consulting teams and specialists in the field of public transport in the broadest sense.
The GTFS (General Transit Feed Specification) format, initially proposed by a Google employee in 2005, allows to publish both static programming information and real-time information based on the location of the vehicle and changing the service along the way.
The European Standardization Committee (CEN), for its part, maintains the Service Interface for Real-Time Information (SIRI) specification, which is an XML protocol that allows information sharing in real-time on public transport services and vehicles in a distributed manner. Similarly, CEN maintains NeTEx, which is a technical standard for sharing public transport schedules and related data. SIRI and NetTEx comply with the CEN
22 Transmodel, which is the European reference data model for public transport.
The dataset is also included in Standard UNE 1798301: 2015 Smart Cities Open Data.
Possible real- time applications
• Check the arrival of the next train / bus: Multitude of apps have been implemented in the cities where this dataset has been released so that travellers can decide when they should leave for the train or bus station.
• Route optimization: Since the beginning of the public transport datasets, numerous cases have been known in which individuals and organizations have proposed alternative routes to the transport authorities to improve the service.
Potential impact of possible applications
One of the best studied cases is that of the city of London, which began publishing data in real-time ten years ago as one of the pioneer cities worldwide. According to a study by Deloitte for the London Transport Authority (TfL) in 2017, the publication of free open data in real-time on public transport brings about 130 million pounds to the economy of the city per year. The estimated cost of the publication is £ 1 million per year.
The benefits observed, which may be applicable to other large cities, include the creation of jobs, creation of high-value start-ups, time savings for citizens or savings in the creation of new apps. Additionally, TfL hares data in areas such as traffic information with other private partners.
Examples of dataset use
• Bus Turnaround Coalition: It is using open data to raise awareness about the state of the New York City bus system, where almost 2.5 million passengers use the bus every day. They have assigned to each bus route a report card that reflects the speed, accuracy and volume of passengers, among other factors.
• Transfermuga: This is a cross-border, multimodal and multilingual passenger information portal for cross-border mobility between France and Spain, specifically between the regions of Aquitaine-Euskadi- Navarra, which uses open data to facilitate inter-city transport by bicycle, ship, car, taxi, bus, train and plane.
• Moovit: It is an application that allows users to see public transport options, such as bus arrival times, maps and train schedules around the world. The company aggregates data on different modes of transport
23 from countries around the world. These data include schedules, and bus and train stations.
Other companies that use open public transport data are Waze, Google, Citymapper, Bus Checker, or Mapway.
Technologies and data involved
• Automatic vehicle location (AVL): The systems for automatically determining and transmitting the geographic location of a vehicle are an essential element for the availability of public transport datasets. They represent an important part of the necessary investment in bus networks.
• Intelligent transport systems: An intelligent transport system (ITS) is an advanced application that aims to provide innovative services related to different modes of transport and allow users to be better informed and make use of transport networks in a more secure, coordinated and intelligent way.
Challenges
• Collaboration between cities and between countries: In many areas economic activity transcends the scope of a city. Achieving efficient and coordinated public transport between population centres - that may be in different regions and even in different countries- is an important challenge where open data facilitates the necessary coordination between authorities.
• Cost of data distribution: For real-time data there are significant costs to operate a data distribution model with an adequate quality of service for reuse. Therefore, it becomes a public spending problem for which solutions such as scalable distribution systems must be developed through aggregators.
24
2.5. Air Quality Dataset
Description
The public's concern for the quality of the air we breathe is increasing.
Some people talk about the new climate. This is understandable considering that pollution levels affect the health and well-being of people, especially children and sick people with heart or lung conditions, but also anyone who spends time outdoors.
This growing interest along with the decrease in the price of sensors is making environmental data increasingly available and accessible through reliable sources. And not only as aggregated statistical data such as the well-known AirBase by the European Environment Agency, but also in real-time through public and private sources. The additional contribution of data that comes from people who install their own sensors and provide their data to the community adds to those that come from the public administrations, responsible for monitoring air quality, expanding data availability and capacity of accumulating large amounts of environmental data.
Air quality data is promoted and used by cities in different ways. Air quality information serves to develop public health policies, but also for decision- making in other areas such as urban planning, for example, to determine where it is better to build a school or hospital based on air quality. These data are also important for other types of public administrations such as the agencies responsible for agricultural policy, since air quality impacts the development of crops; or energy agencies since the optical thickness of the atmosphere, which is an important factor for renewable energy sources, is affected by air quality.
This dataset is one of fifteen analyzed by Open Data Index.
Stakeholders and publication
standards
There is a large number of stakeholders in the creation and use of datasets related to air quality. In addition to environmental, energy, public or health agencies, or the cities themselves, we find a very active ecosystem of researchers, activists and citizens who are contributing to a better understanding of the nature, impact and state of air quality.
A great example is the OpenAQ platform that aggregates historical and real-time data on air quality that comes from governments and research institutions and offers all its resources in the form of open source.
Another example of a global portal is the World Air Quality Index that provides real-time city and country air quality metrics from more than seventy countries, including data from nine thousand stations in eight hundred cities.
25 The list of key pollutants that are monitored is defined by the World Health Organization. Although there is not yet a standard for the publication of the dataset, one of the commonly accepted formats is the one proposed by the OpenAQ platform.
The dataset is also included in Standard UNE 1798301: 2015 Smart Cities Open Data.
Possible real- time applications
• Predict air quality: Similar to the weather forecast, there are models to predict both air pollution levels and their quality. There are many predictive models that are even more complex than those used to predict weather conditions. These models are mathematical simulations of how pollutants are dispersed in the air. If they are associated with the location, they help citizens to make better decisions about how they spend their time indoors or outdoors, depending on the expected state of the atmosphere, decreasing thus the adverse effects on their health and the associated costs.
• Increase awareness against pollution: The publication of open data on air quality is essential to involve citizen volunteers in the fight against air pollution. Events such as the #AirHack of Leeds contributes to raising awareness and to increase the numerous of volunteers exploring how open data could be used to address challenges such as increasing citizen participation in air quality problems.
Potential impact of possible applications
Air pollution prediction is an investment that has an impact on multiple levels: individual, community, national and global. If people are aware of the variations in the quality of the air they breathe and the effect of pollutants on their health, there is a greater chance of motivating changes in both individual behavior and public policies.
Governments also make use of predictions to establish procedures to reduce the severity of local pollution levels. Higher global awareness has the potential to achieve a cleaner environment and a healthier population.
The World Bank's “The cost of Air Pollution” report estimated that the total global cost of premature deaths and diseases due to air pollution is $ 5 billion, including the loss of 225,000 million annually in labour costs. The report also shows that, while the air in high-income countries has become cleaner, conditions have worsened in low and middle-income countries, including serious deterioration in some countries such as Bangladesh and India.
The lack of easy access to air quality data also has an impact on inequality in access to clean air, which is responsible for one in every 8 deaths in the world as it prevents communities from working on improving their air.
26 According to WHO, 92% of the world's population lives in areas that have unhealthy levels of air pollution, and air pollution is considered "a silent killer," which kills 7 million people a year throughout. the world.
Examples of dataset use
• AeroState: It is a start-up that provides predictions and analysis of air quality worldwide. The information has a great level of detail (blocks in a city) and its products are focused on helping cities. As an example, the stations that measure the concentration of air quality of different types of pollution, such as NO2 and O3, can be displayed on maps. In these stations, the diagrams with the most relevant measurements made together with a twenty-four-hour forecast obtained from the AeroState API are shown, using data from the OpenAQ platform.
• In the air, Madrid: It is a visualization that aims to make visible the microscopic agents of the Madrid air (gases, particles, pollen, diseases, etc.), to see how they develop, react and interact with the rest of the city.
The project proposes a platform for decision making and individual and collective awareness.
Technologies and data involved
• Low-cost sensors: The portable and low-cost measuring devices connected through WiFi networks are available today at a fraction of the cost that conventional fixed systems used to have. This allows localities to greatly expand the number of measurement points and even their mobility.
• Other related datasets: meteorological data, noise pollution data, etc.
Challenges
• Increase the network of measurement points: the deployment and maintenance of the sensor networks of the cities is a considerable investment. However, it is essential to work to improve the environmental conditions in which we develop our lives.
• Improve measurement quality: Low-cost air quality sensors help raise awareness of the real need to monitor air quality, but often lack advanced technology to provide accurate measurements of environmental conditions. Since the air quality fluctuates at a faster rate than the weather during the course of the day, calibration and assurance of the quality of the measurements is essential to be able to make correct decisions based on this data.
27
2.6. Noise Pollution Dataset
Description
Noise pollution can be defined as the presence of intrusive and unnecessary sounds that can seriously influence human mental and physical health.
Different people may respond differently to the same level of noise, but above certain levels, noise affects everyone.
Typical sources of noise pollution in cities are transportation, due to road, rail and air traffic; construction and industry; and the premises like shops, restaurants and bars.
Noise pollution is often cited as one of the main factors in reducing the quality of life in large cities. For example, in New York, more than 420,000 complaints about noise were recorded in 2016, more than double than five years earlier.
Noise is a generally underestimated threat that can cause a series of short and long-term health problems, such as sleep disorders, cardiovascular damage, worse work and school performance, hearing impairment, depression, stress, diabetes, etc.
It has been defined as an "ignorant contaminant" that has not been taken seriously enough against other environmental threats. This is also seen in the lesser development of initiatives to collect and exploit open data related to noise pollution.
Stakeholders and publication
standards
Stakeholders include cities and their environmental agencies, manufacturers of measurement equipment and management and processing software and agencies responsible for public health.
A standard for sharing data captured by measurement stations has not been located. The thresholds and measurement values come from legislative sources, which in the case of the European Union are harmonized by Directive 2002/49 / EC.
Possible real- time applications
• Updated noise maps: city noise maps could be built more dynamically and updated, replacing the current way of developing them based on static studies.
• Authorities actions: having real-time data would allow the design of reactive actions against unforeseen situations such as excessive traffic, unauthorized events, etc.
28
Potential impact of possible applications
According to the WHO, 466 million people worldwide have disabling hearing loss, and 34 million of them are minors. Exposure to excessive noise is one of the acquired causes that can be prevented to avoid a part of these cases. According to the WHO, 60% of the hearing loss in children under 15 is attributable to causes that can be prevented.
Only in the European Union (EU) about 40% of the population is exposed to road traffic noise at levels above 55 db (A); 20% are exposed to levels above 65 dB (A) during the day; and more than 30% are exposed to levels above 55 dB (A) at night.
The potential impact of reducing noise pollution levels on public health and the quality of life of citizens is very large, especially in big cities or cities that grow at high speed.
Examples of dataset use
• Madrid (Spain) and other cities: The most frequent use of the dataset is the visualization of the noise level data captured by the measurement stations along with their evolution over time. These data are of great importance for the development of instruments for the evaluation and management of environmental noise and for the definition of policies and norms regarding noise. In the case of Madrid, the Strategic Noise Map 2019 contains maps that serve, among other things, to make a global assessment of the exposure of citizens to environmental noise, to make global predictions, or to enable adoption based on action plan data on noise pollution, containing the most appropriate corrective measures.
• Dublin (Ireland): measures noise pollution during the day and night on the roads and uses these data to review noise maps and strategic action plans. The agency responsible for managing environmental noise publishes maps with the locations of its permanent sound level monitoring stations. In each station the data of the last measurement is shown. To see the evolution of the noise level, it also allows access to the hourly data that are also shown represented in a graph.
Technologies and data involved
• Measuring stations: they are usually composed of outdoor, omnidirectional, anti-bird and windshield microphones that collect data connected to a statistical noise analyser that sends data through the available communications network.
• Other related datasets: noise complaint data, data from sources that are the cause of noise pollution (traffic, public transport, etc.).
29
Challenges
• Collect more accurate data: the deployment of measuring stations is expensive both for the necessary investment and for the maintenance it requires. The need for a greater number of measurement points that collect accurate data that can be used for decision making is a challenge that can be solved with the development of low-cost instruments.
• Design effective actions: if we compare it with other datasets, there is still great untapped potential to design solutions that improve the quality of life of citizens.
30
3. C ONCLUSIONS
In the last decade, open data has become an increasingly popular topic among public authorities and technical staff of local entities, just as the expectations of citizens have increased, who are increasingly aware of their potential to improve the quality of life in cities.
Smart city initiatives also help cities establish more ambitious open data initiatives because data management is one of the big challenges they face. Having an established Open Data strategy is one of the most important aspects to achieve real impact on the life of citizens. The environment and transport are the two areas with the greatest potential impact of the open data policies in real-time of the cities.
When it comes to real-time data, a well-defined strategy is even more important, since the publication and management of open data in real-time is a major economic, technical and organizational challenge. In general, most of the open data portals are not prepared to offer real-time data and just a minority of the local entities publish them. The need to develop better systems that exceed the usual APIs and data catalogues is also detected.
The high cost that in general has the deployment of measurement networks based on fixed or mobile stations, as well as their connection to high-speed communication networks are a brake for the advancement of many initiatives to capture and make available data in real-time. This difficulty is being addressed with the deployment of low-cost devices and data acquisition is being complementing thanks to data sharing agreements with private companies. Sometimes, data collection carried out by people who provide their own measuring equipment or make their time available is also being used, sending data through specific apps installed on their mobile devices.
The publication of data in time on public transport, air quality or road traffic is of great interest to citizens and generates virtuous circles with which cities and their inhabitants obtain large returns that are not only economic. Improving health, saving time or the rest quality are difficult to quantify but have a great effect on the satisfaction and quality of citizens´ life.
The impact that is achieved with the publication of real-time data, in general, is not exclusive
to its own scope but usually has effects on other related areas. For example, the impact of
improvements derived from the publication of real-time traffic data does not only generate
improvements in road travel, but also has an impact on the environment or public health.
31
Finally, the challenge of using real-time data to improve different aspects of life in cities is not
just exclusive to big cities, but also affects medium-sized populations. The problems that
concern citizens are shared even if the scale is different. Although we find greater development
in big cities that try to solve big problems and, in turn, has more resources to do so; There are
many initiatives in small and medium cities that have started with small steps and have
achieved important results with fewer resources.
32