Instituto Tecnológico y de Estudios Superiores de Monterrey

(1)

Campus Ciudad de México

School of Engineering and Sciences

Smart Water Grid: Data analysis and modeling for a water distribution branch in Mexico City.

A dissertation presented by

David Barrientos Torres

Submitted to the

School of Engineering and Sciences

in partial ful llment of the requirements for the degree of Master of Science

In Engineering

(2)

Tlalpan, CDMX, June 16 ^th , 2022

Instituto Tecnológico y de Estudios Superiores de Monterrey

Campus Ciudad de México School of Engineering and Sciences

The committee members, hereby, certify that have read the thesis presented by David Barrientos Torres and that it is fully adequate in scope and quality as a partial requirement for the degree of Master of Science in Engineering.

_______________________

Dr. Martín Rogelio Bustamante Bello Instituto Tecnológico y de Estudios Superiores de Monterrey Principal Advisor _______________________

Dr. Javier Izquierdo Reyes Instituto Tecnológico y de Estudios Superiores de Monterrey Committee Member _______________________

Dr. Enrique Muñoz Díaz Instituto Tecnológico y de Estudios Superiores de Monterrey Committee Member

_______________________

Dr. Rubén Morales Menéndez Dean of Graduate Studies School of Engineering and Sciences

Tlalpan, CDMX, June 16th, 2022

Á

k

(3)

I, David Barrientos Torres, declare that this thesis titled, " Smart Water Grid: Data analysis and modeling for a water distribution branch in Mexico City. ” and the work presented in it are my own. I

con rm that:

● This work was done wholly or mainly while in candidature for a research degree at this University.

● Where any part of this thesis has previously been submitted for a degree or any other quali cation at this University or any other institution, this has been clearly stated.

● Where I have consulted the published work of others, this is always clearly attributed.

● Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.

● I have acknowledged all the main sources of help.

● Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.

___________________________

David Barrientos Torres Tlalpan, CDMX, June 16 ^th , 2022

(4)

To all my loved ones. Thanks for all your unconditional con dence, support, patience, and encouragement. You were my main motivation for pushing through this work.

(5)

I would like to express my deepest gratitude to all those who have been side by side with me, along this path. Thanks for all the support and con dence to José Luis Pablos, Rogelio Bustamante, Daniel Chavarría, Dante Chavarría and Eduardo Vivanco, without them this work couldn’t have

been possible.

(6)

by

David Barrientos Torres

Abstract

Water scarcity in cities is one of the main problems in the world and water security is one of the objectives of the United Nations for 2031. A methodology for anomalies detection is proposed using data analysis, ARIMA models and transfer function models. Real data from ow sensors of several tanks of a branch of the water distribution system of Mexico City, were used for the implementation and validation of the methodology. The resulting models and alerts could improve e ciency in water distribution service by the early detection of wrong measurements and possible leakages.

(7)

1.1 Mexico City urban extension 1950 - 2010 2

1.2 Sources of Mexico City water supply, 2012 4

2.1 Schematic representation of smart water management technologies and tools 14

3.1 Alerting system methodology owchart 19

3.2 Variables by tank measured (1, 2 and 4) and not measured (3) 20

3.3 Diagram of connections of branch “Santa Lucía” 21

3.4 Exit water ow of two weeks of Santa Lucía 4 23

3.5 Entry ow and water level from February 28th to March 31 from Santa Lucía 5 23

3.6 Data pre-processing methodology owchart 24

3.7 Entry Flow of Santa Lucía 2, with highlighted missing regions 25

3.8 First 600 observations of entry Flow of Santa Lucía 2 26

3.9 Density plot (left) and ECDF (right) of the entry ow from Santa Lucía 2 27 3.10 Comparison of normal, gamma and lognormal distribution tted to the entry ow

of Santa Lucía 2

27

3.11 Exploratory Data Analysis of Gamma Distribution tted to Entry Flow of Santa Lucía 2

28

3.12 Exploratory Data Analysis of Log-normal distribution tted to Entry Flow of Santa Lucía 2

28

3.13 Plots of the limits for three con dence intervals (CI): 70% (up-left), 80% (up-right) and 90% (low)

30

3.14 Entry ow of Santa Lucía 2 time series with highlighted missing regions at di erent limits

31

3.15 Entry ow of Santa Lucía 2 time series with weighted moving average imputed values in red at di erent limits

32

3.16 Forecasting methodology owchart 33

3.17 General process for forecasting using an ARIMA model 34

3.18 Santa Lucía 2 entry water ow, from august 25th to august 30th 36 3.19 Di erencing of 95 observations of the original data and the transformed time series 36 3.20 Time series, ACF and PACF plot of the original base case, the entry ow of Santa

Lucía 2, in its incomplete form (left) and imputed form (right)

37

3.21 ACF of residuals of complete time series of entry ow from Santa Lucía 2 37

(8)

3.24 Forecast of 1 day (96 observations) ahead from the observation 672 of the winner model for the entry ow of Santa Lucía 2

39

3.25 Forecasting of 1 week (672 observations) of entry ow of Santa Lucía 2 time series limited at a 95% probability and with imputations of weighted moving average

40

41

3.29 General process for forecasting using a Transfer Function model 42 3.30 Transfer function model case for entry ow of Santa Lucía 5 as input model and

the entry ow of Santa Lucía 4 as output model

43

3.31 Cross correlation of input and output series without prewhitening 44 3.32 Cross-correlation function of input and output series prewhitened 45

3.33 Forecast of 1 day using a transfer function model 46

3.34 Forecasting of 1 week with a transfer function model 47

3.35 Data evaluation owchart 47

3.36 Alarms generated by the comparison of 1 week ahead of the forecasted values 48

(9)

1.1 Mexico City’s monthly water supply from 2005 to 2016 and institutions responsible for

3

2.1 Works toward SWG technologies before 2016 10

2.2 Recent global approaches toward SWG technologies 11

2.3 Water Balance according to IWA 18

3.1 Variables measurable in the case of study 20

3.2 Some summary statistics about a section of the dataset 22 3.3 Comparison of AIC and BIC of two distributions adjusted to entry ow of Santa

Lucía 2

29

3.4 Limits calculated limits following the log-normal distribution tted to Entry Flow of Santa Lucía 2

29

3.5 Total not available data points and number of gaps after applying limits and eliminating data outside limits for 672 observations of entry ow of Santa Lucía 2

31

3.6 ARIMA models residuals summary 38

4.1 Resulted tted distributions and AIC for the data of one year 49

4.2 Correlation of variables modeled 50

4.3 Limited series of Entry ow of Santa Lucía 2 51

4.4 Limited series of Exit ow of Santa Lucía 2 51

4.5 Limited series of entry ow of Santa Lucía 3 51

4.8 Resulting best tted ARIMA models for the variables 52

4.9 Errors from forecasts of models ARIMA 52

4.10 Resulting best tted possible Transfer Function models for the variables 53 4.11 Errors from forecasts of the best tted Transfer Function models 53 4.12 ARIMA and Transfer Function model errors comparison of 1 day forecasting of

entry ow of Santa Lucía 4

52

4.13 ARIMA and Transfer Function model errors comparison of 1 week forecasting of entry ow of Santa Lucía 4

53

4.14 ARIMA and Transfer Function model errors comparison of 1 day forecasting of entry ow of Santa Lucía 3

53

4.15 ARIMA and Transfer Function models MAPE comparison of 1 day and 1 week 54

(10)

4.17 Alerts generated for each model forecast and real new observation 56

(11)

Abstract v

List of Figures vi

List of Tables viii

1 Chapter One: Introduction

1.1 Background. . . 1

1.2 Motivation . . . .. . . 5

1.3 Research Questions . . . 6

1.4 Hypothesis . . . .. . . 6

1.5 Objectives . . . .. . . 6

1.6 Scopes and limitations . . . 6

1.7 Contribution . . . 7

2 Chapter Two: Theoretical Framework 8 2.1 Literature review . . . .. . . 8

2.2 Data understanding . . . 14

2.3 Statistics concepts. . . .. . . 16

2.4 Time Series. . . 16

3 Chapter Three: Methodology 19 3.1 Data Understanding and preparation. . . 19

3.2 Data Modeling & Forecasting. . . 33

3.3 Data evaluation and alerting . . . 47

4 Chapter Four: Results 49 4.1 Data preparation.. . . .. . . 49

4.2 Data Modeling & Forecasting. . . .. . . 52

4.3 Data evaluation. . . .. . . 55

5 Chapter Five: Discussion & future work 57 5.1 Discussion of results. . . . 57

5.2 Future work . . . 59

6 Chapter Six: Conclusions 61

Appendix A: Abbreviations and acronyms 62

(12)

Appendix C: Forecasts of models 73

Bibliography 110

(13)

Chapter 1: Introduction

This chapter explains the main problem that this project tries to help to solve: water scarcity in Mexico City. The main background information is presented, which includes the main Mexico City water challenges to be solved, water status, current water network status, resilience.

Also, the main motivations are described including the “Smart Water Grid” (SWG) bene ts and its importance.

1.1 Background

Access to potable water is a vital resource for living and a human right, but not everyone can have it. According to Mekonnen and Hoekstra, nearly two-thirds of the global population (about 4 billion people) experience severe water scarcity during at least one month of the year and half a billion people face severe water scarcity all year round. The increasing global demand for water has become a threat to the sustainable development of human society caused mainly because of the increasing world population, the improvement in living standards, the change of consumption patterns, and the expansion of irrigated agriculture (Mekonnen and Hoekstra, 2016).

Di erent international agreements and government sustainable plans and programs include the ensuring of water as its main objective. Goal 6 from the Sustainable Development Goals of the United Nations is “Ensure availability and sustainable management of water and sanitation for all”. Goal 11 states: “Make cities and human settlements inclusive, safe, resilient and sustainable” (United Nations, 2018) .

In Mexico, 20 million people su er from severe water scarcity. (Mekonnen and Hoekstra, 2016). From Mexican authorities “Plan Verde,'' “Sustainable water management program for Mexico City”, “National Water Program” and “National Development Plan” are some examples of the e orts and investments to strengthen integrated and sustainable water management. Despite the great e orts made, there is still a great area of opportunity for improving water management and ensuring availability for everyone.

Mexico City Metropolitan Area (MCMA) is the largest human settlement in Mexico. The city itself has an area of 1485 𝑘𝑚 ² and is formed by Mexico City, Mexico State, and Hidalgo with the vast majority of people concentrated in Mexico City. The MCMA had 21,157 million inhabitants in 2016, which represents 16.5% of the total country population and 20.6% of the urban population. The average annual rate of change is 0.9% and it’s projected to grow up to 24,111 million by 2030 (United Nations, 2018) . The increasing population represents an increasing demand for water basic services. According to the 2010 CENSUS, drinking water coverage for MCMA is 96.79%, which means that approximately 700 thousand people need to get the resource by buying trucked water (called “pipas”) or other means. (CONAGUA, 2015) .

(14)

Figure 1.1 Mexico City urban extension 1950 - 2010. Data source: SACMEX.

The history of water management in Mexico City has been long and complex. Since its foundation in 1324 on the middle of Lake Texcoco, the Aztec City of Tenochtitlán (as it was rst named) was sentenced to a future dependent on climate and water risks such as ooding and sinking. (St. George Freeman et al., 2020) Since 1940, large hydraulic infrastructure projects have been developed to import water from surrounding basins and aquifers to reduce the impact of the extraction of water from the subsoil and that are part of the current water supply system (Aguirre & Espinoza, 2012)

Mexico City, transformations in climate meant an increase in mean rainfall as well as an increase in frequency and intensity of extreme events such as oods, droughts, and heatwaves. The profound transformation of the hydrological cycle by the engineered systems, which has created irreversible changes in the regional water balance as well as changes in the basin’s climate. Mexico City is now over-exploiting its water resources by between 19.1 and 22.2 cubic meters per second, depending on the calculations. This creates two kinds of vulnerability. Problems of water availability (scarcity), created by human actions, make water users vulnerable to the changes in the availability of water that are expected from climate change. According to projections where no consideration is given to global warming, between 2005 and 2030 the population of Mexico City will increase by 17.5%, while between 2007 and 2030 available water will diminish by 11.2 %. The situation might get worse if – as expected – climate change brings lower precipitation. Those water

(15)

users who already face recurrent shortages during the dry season or when droughts hit Mexico City will be especially a ected. (Romero Lankao, 2010)

MCMA’s present-day water supply comes from two main sources: locally derived groundwater from beneath the city and imported surface and groundwater from distant basins.

(Valek et al., 2017) Comisión Nacional del Agua (CONAGUA) and Sistema de Aguas de la Ciudad de México (SACMEX) are the current organisms responsible for the management and distribution of water in Mexico City.

Table 1.1 Mexico City’s monthly water supply from 2005 to 2016 and institutions responsible for.

Data source: SACMEX.

The drinking water supply for the MCMA amounts to 31.2 𝑚 ³ / 𝑠 , of which 9 𝑚 ³ / 𝑠 come from the Cutzamala system and 4 𝑚 ³ / 𝑠 from Lerma, 1 𝑚 ³ / 𝑠 from the city wellsprings, and 13.6 from wells that draw water from the upper aquifer of Mexico City. However, there is a de cit 𝑚 ³ / 𝑠

of 1.7 𝑚 ³ / 𝑠 , due to population growth, the conditions of the hydraulic infrastructure, and the

(16)

geographical or legal situation of some settlements (Aguirre & Espinoza, 2012)

Figure 1.2 Sources of Mexico City water supply, 2012. Data source: SACMEX.

Mexico City's distribution is through one of the largest and most complex hydraulic infrastructures in the world, that continuously supplies 67,000 liters per second to MCMA inhabitants, with 2,087 kilometers of primary aqueducts, 10,237 kilometers of secondary distribution lines, and nearly 1,700 wells. The system also has 168 kilometers of deep drainage tunnels, 23 small- and medium-size dams, approximately 100 wastewater treatment facilities, and 91 pumping plants that keep the water owing in lower sections of the city. Up to 25,000 employees work at di erent points in the system. (Cohen & Mancera, 2019)

Despite having great infrastructure, water management is ine ciently managed and the system insu ciently maintained. It’s estimated that approximately 40% of water is lost from the distribution networks due to leakage from old pipes, absence of proper maintenance over prolonged periods, poor construction and management practices, and continuing land subsidence in the metropolitan area. (Aguirre & Espinoza, 2012)

(17)

1.2 Motivation

In order to reduce the threat posed by water scarcity on biodiversity and human welfare, many authors suggest increasing water-use e ciencies and better sharing of the limited freshwater resources. (Mekonnen & Hoekstra, 2016). Water resources management is a critical element to mitigate water scarcity in cities, but it requires an information system. Information is a key requirement for adaptive water management. This information is the result of measuring each of the key steps in the supply, distribution, use and consumption, and collection of the urban water management cycle. For such information to be useful, it must be valid, systematic, and reliable. In addition, such information must be processed and analyzed in order to be useful for decision-making. This process generally involves the construction of indicators. As a whole, this information and indicators system constitutes a control panel that allows seeing the behavior of the supply, the physical control of water losses, and urban consumption patterns. (Pineda-Pablos et al., 2016) .

Approximately, 25% to 50% of all distributed water globally is lost or never invoiced due to leakages, deteriorating infrastructure, incorrect water pressure management, inaccurate billing systems, inaccurate metering and illegal connections. An economically reasonable level of water loss is approximately 8% to 10% or 5% to 6%, depending on the water source.

Real time monitoring and data collection are key in the development of reliable and useful information systems. The collected data has to be as complete and accurate as possible, in order to obtain useful conclusions, models and insights. Nevertheless, in real practice there exist many physical and infrastructural limitations, environmental e ects and human errors that complicate the collection of data.

Some of the main challenges for water management it’s the prevention, identi cation and repair of leakages. Many techniques and methodologies have been developed for leak detection in pipelines but, because of the complexity and extension of the water system in modern cities, it’s required a strong investment in infrastructure for locations, communications and data analysis.

Some intervention strategies of leakage management are pressure management, active leakage control, infrastructure and asset management and speed and quality of repairs.

Many cities around the world have su ered water scarcity for many years for what have been obligated to make strong investments on infrastructure and investigation for improving their water distribution systems and minimize water losses.

In most parts of Mexico water supply has not been a problem until recent years, but Mexico City has had a special relation with water since its foundation. Hundreds of years ago when the city was founded over a lake, water scarcity seemed as an unlikely problem that it’s currently happening.

The large and complex water network it’s also old and in constant deterioration that along with natural disasters, frequent earthquakes, and human intervention generates many leakages.

A methodology for an anomaly detection system is presented in this work in order to improve the water management and reduce water losses. This methodology is based on several statistical methods and other techniques, explained forward, in order to obtain a robust system that uses several inputs to compare and validate anomalies on a water distribution grid.

(18)

1.3 Research Question

1. How to support decision-making for water distribution with a statistical methodology?

1.4 Hypothesis

Based on the previous research questions, a hypothesis for solving the research question is proposed. I think that a methodology for detection of anomalies will support the decision-making for the water distribution network by identifying anomalies and possible leakages, identifying the main losses points and the expected water demand. By realizing these concrete actions leakages and other anomalies could be identi ed faster, allowing a faster response and attention of the problem, which will help to reduce the losses of the actual water distribution network.

1.5 Objectives General

The main objective it's to develop a methodology for the detection of anomalies in the water system distribution.

Particular

The objective of this work is to develop a methodology for early detection of anomalies in ows of a water distribution system, capable of:

● Identify errors in input sensor data.

● Forecast water ows.

● Identify anomalies and possible leakages.

1.6 Scopes and limitations

Since the water network in Mexico City is one of the biggest in the world, the main limitation comes from coverage and data accessibility. The required equipment, installation works, and other assets represent a considerable investment of time and money, so Mexican authorities’

investments of this nature include only certain zones and not the whole network. Most of the available technologies are intrusive and require pipeline operations to be disrupted. This means that condition assessment can only be applied at a small number of selected sites. Currently, only some variables and characteristics are measured in speci c points.

This work combines several methodologies and statistical techniques focused and applied on hydraulic type variables. The data used for this work was provided by the company VWC Services with the permission of use for data analysis. The data provided it’s composed of 18 signals of water tank level, water entry ow and water exit ow corresponding to variables from 6 tanks of one an important pipeline known as “Branch Santa Lucía”.

(19)

1.7 Contribution

The main contribution of this work is to provide a robust methodology that integrates several methods and statistical techniques focused on the water variables for practical use with real data recovered from several sensors in the city. It’s expected that the technological tool resulting from this work will be useful for improving water distribution e ciency in a speci c sector of the city, used by the corresponding authority, and helpful to conserve water of the zone and improve the quality of life of people living there. Also, this methodology could be replicated for other applications and types of variables, since it solves many problems of many types of data.

(20)

Chapter 2: Theoretical framework

In this chapter, it is presented a deeper scope of SWG on the literature and main features, tools, and technologies required for this work, with the intention of bringing to the reader a wider and clearer theoretical framework for the better understanding of this work.

2.1 Concepts Water Grid (WG)

The WG concept is a well-known approach that has been introduced gradually to water distribution networks in order to ensure capacity and increase the security of customers. The basic idea is to use several resources that are interconnected to the network. The approach was promoted with the gradual extension of the networks and with the necessity to overcome the risk of failure.

The use of a single resource has several risks such as the water supply distance and water quality.

Therefore, this idea to combine di erent resources has gradually emerged. In the actual networks, the system has been improved with resources that combine fresh surface water (river water, lake water, etc.), groundwater, and potable water produced from the sea through desalination plants or even recycled wastewater. At the same time, the diversi cation of the resources allows reducing the mobilization of external resources which are located outside of the catchment and which are mobilized at a high cost due to the need for network infrastructure. (Byeon et al., 2015)

Smart Water Grid (SWG)

No clear de nition and scope regarding the SWG has been derived so far. In addition, no consensus of the de nition regarding this term has been made. However, in summarizing the concept as it has been proposed in the previous literature, SWG can be de ned as “a next-generation water management system that introduces Information and Communication Technologies (ICT) in order to increase the management e ciency for water resources and water supply and drainage”. (Kim et al., 2014) The possibilities o ered by ICT and especially by the new generation of sensors combined with the communication network allow to monitor, in real-time, the quality of the resources, to optimize the distribution operations, and to ensure the security of consumers. The introduction of ICT solutions to the water domain represents an opportunity for both improving the management of the water grids and enlarging the mobilization of various resources for more sustainable management. The concept of water grid is then enlarged to the smart water grid approach where ICT solutions have a key role to play. The concept of grid combined with the usage of ICT for water management can de ne the new concept of “Smart Water Grid”. The new approach could be used both to ensure the security of water distribution and to increase reliability with a combination of several resources. (Byeon et al., 2015) A smart grid is a self-healing system that predicts looming failures and takes corrective action to avoid problems.

A smart grid continually optimizes the use of its capital assets while minimizing operational costs.

(Farhangi, 2010) Leak Detection Methods

(21)

Water distribution networks (WDN) leak-detection methods fall into three main categories: water balance, acoustic leak-detection, and non-acoustic leak-detection. Water balance methods typically involve municipalities introducing district metered areas (DMAs) and real-time acquisition of basic hydraulic parameters, in order to study the ow of water in and out of a system or subsystem. Pressure monitoring for burst-detection has been established for over a decade, both in laboratory and eld settings. Transient pressure events due to bursts (water-main breaks are major contributors to transient pressure events that lead to signi cant water loss) are relatively easy to detect using system pressures or locally employed pressure gauges. However, continuously-running small leaks can be masked in the operational noise environment, where more sophisticated instrumentation (e.g., hydrophones) and algorithms may be necessary for detection.

(Cody & Narasimhan, 2020)

The concept of smart and sustainable cities was introduced to overcome the great challenges facing urban development. Some cities in the world have implemented smart water grids on their network distribution, and many studies have been published about factors and incorporate models that provide reliable estimates to predict urban water consumption at the residential level.

Walker presents a model for predicting domestic water consumption based on neural networks, the model is based on real data collected from smart meters in real-time. The work includes the analysis of three main elements to identify the behavior of domestic water consumers:

behaviors of end-use; sociodemographic and property characteristics; and psychosocial constructions such as attitudes and beliefs (Walker et al., 2015) . Chen proposes a benchmarking model for domestic water consumption based on Adaptive Logic Networks. The authors present a work based on Deep Learning (DL) and arti cial neural networks where they show the applications for the simulation, optimization, and control of the operation of water distribution systems (Chen et al., 2015) . Sanz and Pérez describe a methodology for the placement of sensors based on the analysis of pressure and ow sensitivity using the Singular Value Decomposition method (Sanz Estapé & Pérez Magrané, 2015) . Candelieri et al. propose a fully adaptive, data-driven Support Vector Machine-based self-learning algorithm to forecast short-term, hourly water demand based on the availability of Automatic Metering Readers (AMR). (Candelieri et al., 2015)

Support Vector Machine and Arti cial Neural Network have been recently exploited by Nasir et al. (Nasir et al., 2014) . The experiments have been performed in order to estimate both the position and size of water leakages. The EPANET tool has been adopted to simulate a residential network with two pressure sensors, two di erential pressure sensors, two ow sensors and to acquire the data in the simulation circuit. About 1800 scenarios have been adopted to train the models, and an additional 1800 scenarios have been used to test them. The performance has been evaluated in terms of MSE and the squared correlation error coe cient (R-square). The proposed quasi-static analysis con rmed the good behavior of the SVM and its resilience to sensor measurement errors. (Fagiani et al., 2016)

In Singapore, the implementation of a Smart Water Grid system supports the Public Utilities Board (PUB)’s mission to supply good water 24/7 to its customers. With sensors and analytic tools deployed island-wide to provide a real-time monitoring and decision support system, the Smart Water Grid system enables PUB to manage the water supply network e ciently, ensuring that all Singaporeans will continue to enjoy a reliable and sustainable water supply for generations

(22)

to come. The Water Supply Network Department manages the 5,490 km of potable mains, 573 km of NEWater (high-grade reclaimed water) mains, and 42 km of industrial water mains that deliver the water to more than 1.4 million customers. Sensors, meters, digital controls, and analytic tools are used to automate, monitor, and control the transmission and distribution of water, ensuring that water is e ciently delivered only when and where it is needed and with good quality. PUB has worked with Aqleo (previously a spin-o from the National University of Singapore) to develop a Pipeline Failure Analysis Model to predict hotspots of pipe failures through an R&D project.

Using Bayesian statistics, the model is able to identify 25 % of the total pipes in the network where failures are most likely to occur. (Singapore & Public Utilities Board Singapore, 2016)

Table 2.2 presents the previous work toward SWG technologies. The works are presented in a table with the name, region and scope of the project, as a brief description of the methodology used and the main results obtained. The works considered as previous works, are the ones developed or published before 2016.

Reference Project Region Scope Methodology Result

(Singapore & Public

Utilities Board Singapore,

2016)

Water Supply Network Department

(PUB)

Singapore 300 multiparameter probes around the

city.

Bayesian computed algorithms identify the pressure drop signatures associated with pipe

bursts. Identify leak spots.

Electronic alerts to operators.

Detection of 25% of the total pipes where leaks are

most likely to occur.

(Lee et al.,

2015) South East Queensland

(SEQ)

Australia 12 dams, 13 water treatment plants,

1 desalination plant, 28 water reservoirs, and 22

water pumping stations.

Move water to where it is needed most by using bi-directional water pipelines, and puri ed recycled water can be introduced

to the system at several points

Rainfall dependency dropped from 95% to 75% in four years.

(Lee et al.,

2015) National Smart Water

Grid (NSWG)

USA Estimated construction of over 5,000 km of pipelines around Lake Powell, Utah

The pumping of freshwater via pipelines from areas of overabundance/ ood to areas of

drought or high demand.

Captured water will be distributed to

destinations near.

(Keeling &

Sullivan, 2012)

IBM Smarter

City Sustainabilit

y Model

USA Dubuque, Iowa 23,000 households

A platform that monitors water consumption every 15 minutes.

Noti es households of potential leaks and anomalies and water usage information expressed in dollars, gallon, and carbon

savings.

Increased water leak detection

of 8%

(Keeling &

Sullivan, 2012)

IBM: Rio

de Janeiro Brazil Rio de Janeiro Implementation of a high-resolution weather forecasting and hydrological modeling system can now predict

heavy rains up to 48 hours in advance

Reduce the reaction times

to emergency situations

(23)

(Byeon et

al., 2015) Smart Water Grid concept on Yeongjongd o Island

Korea Pipe 2.3 km long with a capacity of 155,000 m3/day.

Measure and analysis of data on the water to provide a water balance with the total water resources available and the total

water demand.

Water consumption calculation for

optimal distribution.

(Lee et al.,

2015) SWG

research Korea Korean cities and

watersheds Component technology development. Optimized parts

and materials development.

Development of ICT-based integrated management system

for SWG.

Work in progress.

(Keeling &

Sullivan, 2012)

Ireland’s Marine Institute

Ireland Galway Bay Real-time sensors monitoring and transmitting key data on

water quality and ocean conditions. Analyzed data display

alarms to notify when certain conditions arise, such as an increase in pollution or potential

ooding.

Improved public safety

Table 2.1 Works toward SWG technologies before 2016.

The next table presents the main characteristics of global approaches towards Smart Water Grids, water grid management, and leak detection, as a brief description of the methodology and its main results.

Refere

nce Project Region Scope Methodology Result

(Ochoa & Ruíz, 2017)

Experiences in the Evaluation of Drinking Water Leaks Hydrometric of

Sectors in Mexico City.

Mexico

City Nine

hydrometric sectors

Using the International Water Association method, which is based on records of continuous and simultaneous measurement of the hydraulic pressure and the ow rate supplied to its network for one

day.

Identi cation of leakage levels.

(López &

Ochoa, 2017)

Model for quantifying

leakage in District Metered Area

Álvaro Obregón

sector sector SH-5

- “Las Águilas”

1,075 household intakes, four

pressure reducing valves (PRVs) and 9.6 km of

pipes.

Inverse analysis for quanti cation of leakage ows

at nodes and pipes of hidrometrics sectors, by using

the Epanet software for simulation model hydraulic in

combination with Matlab software for running the mathematical algorithm to calibrate the coe cient C and

estimate leakage values.

Identi cation of nodes and pipes with higher leakage

ows but did not coincide with the actual leaks reported

in the network.

(Morel os &

Ramíre

Hydraulic modeling of

the

San Luis Río Colorado,

25,000

inhabitants Analysis and design by computer of a potable water

distribution network using

Design, analysis and hydraulic modeling of the network ,

(24)

z, 2017) distribution network drinking water

in a Mexican city with EPANET

Sonora,

Mexico EPANET. indicating that the

pressures and ow rates in pipes are adequate to meet demand in the study

area.

(Njepu et al., 2019)

Optimal tank sizing and operation of energy-water

supply systems in residences

South

Africa Durban, KwaZulu,

simulated residential house.

To maximize the water consumption and operational

cost of the system at a residential level. Use of a rainwater harvesting system,

greywater recycling system, water storage, and gravity-fed

distribution system.

Water-saving 20.5%

and energy saving cost 62.54% by the

produced simulation.

(Xue et al., 2020)

Machine learning-based

leakage fault detection for

district heating networks

China 40 supply pipes and 40 return pipes

Using the hydraulic simulation model, all possible leakage

faults are simulated. An XGBoost-based model is trained on the leakage data set.

Once the delayed alert triggering algorithm issues a leakage signal, a variation rate

vector of observation data corresponding to this leakage

condition is collected. The vector then inputs to the trained model, and the model

will output the name of the leakage pipe.

Mean values of accuracy and a macro-F1 score of the model results are 85.85% and 0.99786,

respectively

(Cody &

Narasi mhan, 2020)

Field implementatio

n of linear prediction for leak-monitori ng in water distribution networks

Ontario (Guelph),

Canada

Approximatel y 1500m of

PVC pipe

Linear prediction model for semi-supervised leak-detection.

Isolates the general region of the leak by rst identifying the

two or three closest hydrant locations, then locates a more

exact location.

Error from 1% to 24.8% depending on

the ow amount.

(Bermú dez et

al., 2018)

Modeling and Simulation of a Hydraulic Network for

Leak Diagnosis

Mexico Prototype Design of a hydraulic network that allows the development of leakage control and diagnosis algorithms for three di erent con gurations: single ducts, ducts with branches and closed

networks.

Proposed model guarantees turbulent

ow in all its lines for the considered con gurations and that its construction

and use as a scale model of pressurized

networks is feasible (Taghla

bi et al., 2020)

Prelocalizatio n and leak detection in drinking water

distribution

Casablanca

, Morocco Micro-modul ated sector with a single

critical pressure point

Two methods: 1. A simulation of arti cial leaks on the MATLAB platform using the

EPANET code to establish a database of pressures that

The two methods converged to comparable results.

The leak position is spotted

(25)

networks using modeling-base

d algorithms:

a case study for the city of

Casablanca (Morocco)

that is continuously monitored. It

has 3 inlets, 493 nodes, 42 km

of pipes and around 24

000 inhabitants.

describes the network’s behavior in the presence of leaks. Then fed into a random

forest machine learning algorithm

to forecast the leakage rate and its location in the network; 2.

real simulation of arti cial leaks by opening and closing of hydrants, on di erent locations

with

a leak size of 6 and 17 L s−1

within a 100 m radius of the actual

leaks.

(Pérez- Pérez et al., 2021)

Leak diagnosis in pipelines

using a combined

arti cial neural network approach.

Mexico Experimental Detect and locate water leaks in pipelines by using arti cial

neural networks (ANN) techniques and online measurements of pressure and

ow rate, estimating the friction factor of the pipe and

using this information as an input to compute the leak

position.

An average error of 0.629% was obtained

for leak location in the experiments.

(Fabbia no et

al., 2020)

Smart water grid: A smart methodology to detect leaks

in water distribution

networks

Italy Experimental Measuring the radial vibrational status of opportune

pipes of the network. The idea arises from the consideration that the variation of the energy transmitted to the pipe walls by the radial component of the

vibrations induced by the turbulence of the uid onto them may be related to the ow

leak itself.

Proven that parameter of radial vibration signals that the turbulence of the ow transmits to the walls of the pipes is linearly dependent

only on the variations of ow rate due to the leak

(Farah &

Shahro ur, 2017)

Leakage Detection Using Smart Water System:

Combination of Water Balance and Automated Minimum Night Flow

Lille,

France Scienti c Campus of the University of Lille, which is the size of a

small town.

Analysis of real-time data has allowed the veri cation of

water balance and the estimation of water losses level in the network. An improvement of

the application of the minimum night ow method,

which is based on the determination of ow thresholds. A leak alarm is generated if the night

ow exceeds the thresholds.

Allowed the detection of 25 unreported leaks and

decreased the Non-Revenue Water (NRW) level by 36%

Table 2.2 Recent global approaches toward SWG technologies.

(26)

The SWG has the following main technological components: water resources management technology that collects and stores a variety of water resources into water platforms and also integrates and manages the distribution and transportation of water. It includes an ICT-based integration management system that can support real-time monitoring for securing, transporting and utilizing water resources. It also supports integration management and decision making regarding water information. (Kim et al., 2014)

SWG tools can be categorized in the seven main areas listed below. It should be noted that the examples provided are not limited to these areas, but may overlap several others.

1. Data acquisition and integration (e.g. sensor networks, smart pipes, smart meters).

2. Data dissemination (e.g. radio transmitters, wireless delity (WiFi), Internet, GPRS).

3. Modeling and analytics (e.g. machine learning).

4. Data processing and storage (e.g. software as a service (SaaS), cloud computing).

5. Management and control (e.g. supervisory control and data acquisition (SCADA, WINCC), optimization tools).

6. Visualization and decision support (e.g. web-based communication and information systems tools).

7. Restitution of data and information to cities’ technical services and to the end-users e.g.

Tools for sharing information on water and on services). (Lazreg, 2018)

Figure 2.1 Schematic representation of smart water management technologies and tools (Lazreg, 2018).

This work focuses mainly on establishing a methodology for the modeling and analytics stage in the smart water framework and as a base for the development of a tool.

2.2 Data understanding

There is no universal algorithm or methodology which works better for all types of application, and each model or algorithm has to be adjusted to each speci c task. Nevertheless, any kind of model will not work properly if the input data for training or modeling it’s not accurate and of quality. Also it’s very important to have the vision of the goal to be achieved and the expertise to understand the problematic and the type of data that it’s being used in order to match the requirements of the problematic.

(27)

It’s important to understand the data source and the extraction conditions, such as the format, number of samples, data range, etc. The types of data are categorical, ordinal, cardinal and continuous. The amount of data depends on the complexity of the model or algorithm, but in general it is better to have recent values related to the main problem to be analyzed or solved. The raw data obtained from the di erent sources has to be processed and converted to a proper form suitable for modeling. The format needed has to be standard and well-de ned for all the dataset.

Some transformations are numerical (as binning or scaling) and non-numerical.

The process of data cleaning consists of detecting and removing missing values, outliers, duplicate data, irrelevant data and correct formatting, typographical errors and sparse categories.

Outliers are the data values that are less probable of happening than most of the data values. Global outliers are data points that look strange overall and stand out in comparison to all the other points.

A local outlier looks strange next to its neighbors, but it ts the data overall. Some causes of outliers are issues in data collections, or because something new and unusual actually occurred. Sometimes outliers can be easily identi ed by a visual inspection of a box plot or scatter plot and remove them from the original data set. Also outliers can be identi ed by calculating the z-score, for parametric distributions, and identifying the scores far from the average, or by implementing algorithms for outlier detection. An anomaly is an observation that doesn't t in the expected pattern and it's out of the ordinary, but explainable by some other feature.

Completeness is an important characteristic of the quality of data so for partial data with missing values, it’s necessary to impute values. The types of missing data are missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). There exists many techniques for data imputation. Some of them are imputation by constant where the imputed value is constant and it won’t depend on any other feature such as mean, media, etc. Also some algorithms or models could be used to predict missing values such as deterministic regression imputation and stochastic regression imputation. For temporal data it can be used the forward ll (last value until next observation), back ll (the future value lls back) or linear interpolation (connect the last and next observed data with a line).

Exploratory Data Analysis (EDA) consists of the visual inspection of data that rst has been processed and deployed in useful graphs that represent the main insights of the data. This rst step helps to identify the main characteristics and behavior of the data in order to get preliminary conclusions about possible issues and behavior. The EDA could include many types of graphs in time and density. Some of the most common and useful types of graphs for data representation are the time series plot, density plot, bar plot, scatterplots, Q-Q (quantile-quantile) plot, cumulative distribution function (CDF) and box plot.

2.3 Statistics concepts

To t a dataset to a distribution it can be by testing the empirical and theoretical values of the data set. Fitting a parametric distribution from a data set consists of nding the value of the parameters with which the tted distribution may have generated the observed data with greater

(28)

probability. For example, the normal distribution has two parameters (mean and variance), once these two parameters are known, the entire distribution is known.

2.4 Time series

In this section it’s presented the methodology used for the statistical analysis and time series analysis of the data. In order to establish the basis for the further statistical analysis, as the development of the ARIMA models for the forecasting of data using the methodology of Box and Jenkins.

Proposed originally in 1970. Named after the statisticians George Box and Gwilym Jenkins.

Integrates an autoregressive component, established in accordance with the previous values, and the moving average factor. This kind of model allows us to incorporate seasonal analysis and isolate the trend component of exclusively stationary series (series with constant mean and variance over time). These models are also called ARIMA (Autoregressive Integrated Moving Average) and it’s one of the most widely used approaches for time series short-term forecasting and for seasonal variations on series. (Fernandes et al., 2008)

This methodology allows us to systematize the previous methods for time series forecasting.

A stationary time series it’s the one that doesn’t change over time. In time series, seasonality is the variation of data during a period of time that occurs speci cally patterns or regular intervals.

In an ARIMA model (p, d, q), p corresponds to the order of the Autoregressive process (AR), d is the number of di erences and q corresponds to the order of the moving averages process (MA). Since they are usually used in short-term forecasting, ARIMA models are used to capture seasonal behavior with the product of the models: ARIMA(p, d, q)(P, D, Q), where the second part corresponds to the seasonal part.

The Akaike’s Information Criterion (AIC) and Bayesian Criterion Information (BIC) are useful for determining the order of an ARIMA, by minimizing these criterias. These are multiple metrics that allow us to quantify how well a distribution ts the observed data. Both take into account the log-likelihood and add a penalty proportional to the number of distribution parameters (degrees of freedom). This makes it possible to compare the t between distributions with di erent numbers of parameters, since, in general terms, the more parameters a distribution has, the more easily it ts the data and the lower its log likelihood. (Chakrabarti & Ghosh, 2011)

The di erence between AIC and BIC is the severity with which they penalize the number of parameters of the distribution. For all of them, the lower the value, the better the t. It is important to note that none of these metrics serve to quantify model quality in an absolute sense, but to compare relative quality between models/ ts. If all candidate ts are bad, they do not provide any notice of this. (Chakrabarti & Ghosh, 2011)

According to the Box-Jenkins methodology, rst it’s necessary to identify the series and remove the non-stationarity, since the model works only on stationary time series (it can be done by transforming the original data). In general, the stationary time series plots will show the series to be horizontal and with no predictable patterns in the long. Transforming the original data could be very helpful for achieving stationarity, and could have considerable e ects like altering the original scale, reducing asymmetries, and eliminating possible outliers. After identifying the time series, a

(29)

model should be postulated and tentatively considered. Its parameters need to be estimated as well as an assessment and diagnostic of the tentative model. If necessary, a new model should be proposed and tested in order to nd the best model that describes the data. Finally, with the selected model, a forecast can be made.

Transfer Function Model is a model that combines time series approach with a causal approach. The time series xt in uences the time series yt through a transfer function which distributes the impact of xt through some period in the future. The resulting model is called the transfer function model which connects the output series ( yt ), the input series ( xt ), and noise ( nt ).

The correlation between X and Y (transfer function) is also called the cross-correlation (Box &

Jenkins, 2015). These models are useful for prediction, control and other applications, taking into account the dynamic relationships between two time series. Sometimes before the response variable begins to take e ect given an input, there is an initial period of pure delay or dead time.

Even if a model were entirely adequate, the output Y could not be expected to follow exactly the pattern determined by the transfer function model, in practice. Disturbances of various kinds other than X normally corrupt the system. A disturbance might originate at any point in the system, but it is often convenient to consider it in terms of its net e ect on the output Y. (Box &

Jenkins, 2015)

2.5 Water loss reduction

The International Water Association (IWA) has developed a standard international water balance structure and terminology that has been adopted by national associations in many countries across the world. From the total system input volume, the IWA divided it into two main categories, authorized consumption and water losses.

Table 2.3 Water Balance according to IWA. (Source: Lambert, A. and W. Hirner, 2000)

(30)

A water loss is de ned as the di erence between water pumped into a system and billed water. For a water loss reduction it’s necessary an initial situational analysis to assess non-revenue water (NRW). Then to formulate clear objectives and targets for the water distribution network and setting and action plan for the implementation phase. Real losses from pipe networks can be managed using ow monitoring to identify the need for active leak detection.

Water losses are divided into two categories: apparent losses and real losses. The apparent losses are economical losses but is water that it's used by someone, this include meter under-registration, theft, unleashed supply and data management errors. The real water losses are all the liters that nobody uses and goes to the ground or sewer system directly, this includes the leakages and over ows in tanks, connections and distribution lines.

The authorized consumption includes billed and unbilled authorized consumption. The billed authorized consumption or revenue water is the water that it's consumed and billed, and includes metered consumption, such as exported water to other water grids, domestic and non-domestic use, and unmetered consumption, as public parks or xed fee consumption. The unbilled authorized consumption includes metered consumption, as res and others, and unmetered consumption as evaporation in tanks and leaks repair.

The water ow can be measured in physical quantities of kind volumetric ow rates, with units as liters per second. The total volume can be measured in m3 and calculated by multiplying the volume ow rate by the period of time.

(31)

Chapter 3: Methodology

In this chapter, it is presented the methodology used for the statistical analysis and time series analysis of the data. In order to establish the basis for the further statistical analysis, as the development of the models for the forecasting. In order to predict whether a new observation it’s an anomaly or normal on time series, it’s presented a methodology for anomaly detection using data analysis and two types of time series models. The general methodology for forecasting, evaluating and generating alerts it’s shown in Figure 3.1.

Figure 3.1. Alerting system methodology owchart.

3.1 Data understanding and preparation.

The case of study it’s composed of 6 tanks connected between them by pipelines of 48 inches of diameter. The six tanks have several variables corresponding to water tank and ow rate, but not all the variables apply for all the tanks and in some cases. In total there are17 variables and each one variable it’s recorded every 15 minutes with a timestamp. The next Table 3.1 shows the summary of available variables by tank. NA corresponds to variables “Not Available” or not measured.