Cluster analysis of electricity and gas consumption profiles in the residential sector of the U.S.A

(1)

ESCUELA TECNICA SUPERIOR DE INGENIERIA (ICAI)

GRADO EN INGENIERIA ELECTROMECANICA

Especialidad Eléctrica

Cluster Analysis of Electricity and Gas

Consumption Profiles in the Residential

Sector of the U.S.A

Autor: Laura GELABERT COSTACURTA Director: Miguel Ángel Sanz Bobi

Madrid Junio 2016

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

ESCUELA TECNICA SUPERIOR DE INGENIERIA (ICAI)

GRADO EN INGENIERIA ELECTROMECANICA

Especialidad Eléctrica

Cluster Analysis of Electricity and Gas

Consumption Profiles in the Residential

Sector of the U.S.A

Autor: Laura GELABERT COSTACURTA Director: Miguel Ángel Sanz Bobi

Madrid Junio 2016

(10)

(11)

CLUSTER ANALYSIS OF ELECTRICITY AND GAS CONSUMPTION PROFILES IN THE RESIDENTIAL SECTOR OF THE U.S.A.

Author: Gelabert Costacurta, Laura

Director: Sanz Bobi, Miguel Ángel

Collaborating Institution: ICAI – Universidad Pontificia Comillas

ABSTRACT

The aim of this project is to study how a particular combination of cluster analysis techniques can work when it comes to analyse energy-related sets of data and obtain coherent archetypal profiles.

Figure 1

Improving energy efficiency and management is one of the most important challenges the electrical sector faces and is trying to tackle in order to achieve stable and sustainable power transmission systems. Also, in the context of deregulated energy markets, a correct analysis and interpretation of information is crucial to address supply and demand management. Bearing this background in mind, periodical load data from the residential sector of several states the U.S.A are used as a case study.

Figure 2

The state of the art basically regards the development and usefulness of several clustering techniques, first in data mining processes and then in more specific ones related to the energy sector. For instance, load forecasting and load profiling (Fig. 2) are permanently used methods in the electrical sector and their improvement through clustering techniques has been well investigated. Other uses in this sector as well as residential-related studies must also be mentioned. Also, smart grids, the actual interest that has been put in them, as well as researches that involve neural networks and self-organized maps are the most recent aspects must also be contemplated in this context.

(12)

The objectives of the project are to discover archetypal consumption profiles and patterns from data through clustering techniques and to study the relationship between electricity and gas consumption. Also, the ultimate objective of the project is to come up with ideas for innovative decision-making processes based on the different types of profiles previously established. This may allow the establishment of policies that, once implemented by means of instructions to customers, would improve and facilitate a more optimal operation of the system.

Figure 3

The project has been developed using MATLAB as main tool. Sources of information were some public datasets from the American Energy Information Administration including typical households use of electrical and gas energies across the whole Nation. There is usually one set of data for several cities within each state that contains consumption values for 8760 hours of a whole year (Fig. 3). Basically, for each case study, after obtaining the preliminary information regarding the state, relevant data sets are organized and put into the correct format for the further application of algorithms. Then an inter-city analysis, comparing cities between themselves within the state, and an intra-city analysis, that individually studies each of them are performed. This is done for electricity as well as gas consumption and once results are analysed and compared archetypal profiles and time logical models are established for the state. The last step consists of conclusions and decision making-processes orientated to consumers.

A combination of two particular clustering techniques was used for all the data analysis of the project. The first technique is the hierarchical clustering while Self-Organised Maps (SOMs) constitute the second one.

Case Studies develop and expand how the analysis was performed on energy data sets within four different states of the U.S.A. Concretely; these were Illinois, Indiana, Florida and California. Each state constitutes a case study where after a brief introduction and overview of the available data, hierarchical clustering and self-organised maps were applied, for both electricity and gas consumption analysis.

Hierarchical clustering (Fig. 4) was first used to perform the inter-city analysis where its resulting dendrogram gave an insight of similar cities within the state and if it implied some kind of relation with their geographical location. Then it was used for the intra-city analysis, for each city of the state, individually. There, the calculation of a cophenetic correlation coefficient matrix allowed to see which days of the week had a similar behaviour and how reliable was the information obtained for each day. Also, another dendrogram allowed analysing and establishing weekly archetypal profiles.

(13)

Inter-city dendrogram Cophenetic correlation _{coefficient matrix} Intra-city dendrogram

Figure 4

Self-Organized maps (SOMs) were used to check or complement results obtained with hierarchical clustering as it gives a conjunct graphic view of inter- and intra-city analysis of the case study. Three plots were used to illustrate results (Fig. 5). The hits plot, that shows the distribution of the input information within each output neuron, allows to evaluate the accuracy of the SOM: the better the distribution, the more accurate and reliable the SOM. The bar graph shows relative information between all input cities within each output neuron and the weight plot represents the evolution of each city from the first to the last neuron but this time with an absolute energetic scale.

Hits Plot Bar Graph Weight Plot

Figure 5

To conclude, all case studies are finally compared and contrasted thanks to the selection of relevant results and analysis obtained. This has allowed giving a general evaluation of the whole project. Final conclusions establish the validity of the methodology and how interesting its application can be when it comes to obtain a good approximation of the overall behaviour of customary households of various cities within states of the U.S.A. This result is useful in order to identify where energy is efficiently used, give an idea of which measures that could help grids management and operation could be implemented.

Finally, this project may, later on, be useful to expand the comparative basis that was started with other states of the U.S.A, research the interaction between electricity and gas consumption behaviours, or see where and how smart grids could start to be installed.

City Reference Number

4 7 1 8 2 5 3 6

Dissimilarity (%) 0 10 20 30 40 50 60 70

Mon Tue Wed Thu Fri Sat Sun Mon 0,849 0,849 0,851 0,850 0,850 0,850 0,849 Tue 0,8700,879 0,880 0,881 0,879 0,879 0,872 Wed 0,863 0,869 0,873 0,875 0,873 0,872 0,866 Thu 0,865 0,870 0,875 0,877 0,875 0,874 0,868 Fri 0,854 0,860 0,864 0,865 0,864 0,863 0,857 Sat 0,857 0,862 0,865 0,865 0,865 0,865 0,859 Sun 0,847 0,847 0,850 0,849 0,849 0,849 0,848

Hours of the day

3 4 2 5 1 6 7 8 9241011121314151623171822192021

Dissimilarity (%) 0 5 10 15 20 25

Electricity, City 2 - Tuesday

-1 0 1 2 3 4 5 6 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

154 248 388 241 605 732 72 305 285 600 588 727 99 96 435 618 862 254 105 313 685 83 126 139

Hits

Neurons

0 5 10 15 20 25

kWh 0 0.5 1 1.5 2 2.5 3

3.5 Weight Plot

data1 data2 data3 data4 data5 data6 data7 data8

(14)

(15)

ESTUDIO DE PERFILES DE CONSUMO ELECTRICO Y DE GAS EN EL SECTOR RESIDENCIAL DE EE.UU MEDIANTE TECNICAS DE CLUSTERING.

Autor: Gelabert Costacurta, Laura

Director: Sanz Bobi, Miguel Ángel

Entidad Colaboradora: ICAI – Universidad Pontificia Comillas

RESUMEN

Este proyecto pretende estudiar cómo una combinación particular de técnicas de clustering pueden funcionar a la hora de analizar bases de datos de consumo energético y obtener perfiles patrones coherentes.

Figura 1

Mejorar la gestión de la energía así cómo alcanzar ciertos niveles de eficiencia energética es uno de los retos más importantes a los que se esta enfrentando el sector eléctrico en vistas de conseguir sistemas de transmisión de energía estables y sostenibles. Además, en el contexto de mercados energéticos desregulados, un buen análisis de datos y una correcta interpretación de resultados es esencial para encargarse tanto de la gestión de la oferta cómo de la satisfacción de la demanda en este sector. Teniendo en cuenta este contexto, datos periódicos de carga energética del sector residencial de varios estados de EE.UU han sido utilizados como casos de estudio.

Figura 2

El estado del arte, básicamente concierne el desarrollo y la utilidad de distintas técnicas de clustering : primero en procesos de minería de datos y luego en contextos más concretos, relacionados con el sector energético. Por ejemplo, la predicción de cargas y el establecimiento de perfiles de carga (Fig. 2) se usan constantemente en el sector eléctrico y muchos estudios investigaron su mejora gracias a la utilización de técnicas de clustering. Estas técnicas se incluyeron en muchas más aplicaciones de

(16)

debe de ser mencionados. Además, la Redes Eléctricas Inteligentes, el reciente y creciente interés que se ha puesto en ellas, así cómo investigaciones que relacionadas con redes neuronales y mapas auto-organizados son los aspectos más recientes que se contemplan.

Los objetivos del proyecto persiguen descubrir si existen perfiles patrones y características de consumo a partir de datos y mediante técnicas de clustering. Pero, el objetivo último del proyecto es conseguir ideas de procesos de toma de decisiones innovadores en base a los distintos perfiles previamente definidos. Esto permitiría elaborar unas políticas de consumo, que, una vez implementadas por medio de instrucciones a consumidores, mejorarían y facilitarían una operación más optima del sistema.

Figura 3

El proyecto se llevó a cabo con MATLAB cómo principal herramienta. Las fuentes de información son conjuntos de datos públicos de la “Energy Information Administration” americana que incluyen usos típicos de electricidad y gas de hogares en todo el país. Normalmente se dispone de una colección de datos para varias en cada estado, que contiene valores de consumo para las 8760 horas de un año entero (Fig. 3). Para cada caso de estudio, después de haber obtenido información preliminar sobre el estado, los datos relevantesse ordenan y se ponen en el formato correcto para que se les pueda aplicar los algoritmos necesarios. A continuación, se realizan un análisis inter-ciudad, que compara todas las ciudades de un estado entre ellas, y un análisis intra-ciudad, que estudia cada una de ellas individualmente. Este proceso se aplica tanto a consumo de electricidad cómo de gas. Una vez los resultados analizados y comparados, perfiles patrones y modelos lógicos temporales se establecen para cada estado. El último paso consiste en la deducción de conclusiones and procesos de toma de decisiones orientados a clientes.

La combinación de dos técnicas de clustering en particular se aplicó para el análisis de todos los datos del proyecto. La primera técnica es el clustering jerárquico mientras que los mapas auto-organizados constituyen la segunda.

Los casos de estudio exponen y detallan cómo se realizo el análisis sobre la colección de datos de consumo energético en cuatro diferentes estados de EE.UU. Precisamente, estos fueron Illinois, Indiana, Florida y California. Cada estado constituye, individualmente, un caso de estudio. En cada uno, después de una breve introducción y un repaso de los datos disponibles, se aplica el clustering jerárquico y los mapas auto-organizados, para analizar tanto el consumo de electricidad cómo de gas

El clustering jerárquico (Fig. 4) se usa primero para el análisis inter-ciudad, donde el dendrograma resultante ofrece una percepción de las ciudades similares entre si

(17)

ciudad para cada ciudad del estado, de forma individual. Ahí, el cálculo de una matriz de coeficientes de correlación cophenetica permite ver qué días de la semana tienen un comportamiento similar y el nivel de fiabilidad de la información obtenida para cada día. Además, otro dendrograma permite establecer y analizar perfiles patrones semanales.

Dendrograma Inter-ciudad Matriz de coeficiente de correlación cophenetica

Dendrograma Intra-ciudad

Figura 4

Los mapas auto-organizados se usan para comprobar y complementar los resultados obtenidos con el clustering jerárquico, ya que permiten visualizar de forma gráfica y conjunta el análisis tanto inter- cómo intra-ciudad de cada caso de estudio. Tres gráficas se utilizan para mostrar los resultados obtenidos (Fig. 5). El “hits plot”, que muestra como se distribuye la información de entrada dentro de cada neurona de salida, permite evaluar la veracidad del mapa. En cuanto más homogénea sea distribución, más fiable y exacto es el mapa auto-organizado. El “bar graph” muestra la información de todas las ciudades de entrada en cada neurona de salida de forma relativa y el “weight plot” representa la evolucion de cada ciudad desde la primera hasta la última neurona pero esta vez con escala energética absoluta.

Hits Plot Bar Graph Weight Plot

Figura 5

Para concluir, todos los casos de estudio han sido comparados y contrastados después de haber seleccionado todos los resultados y análisis relevantes que se habían obtenido. Esto ha permitido realizar una evaluación general de todo el proyecto. Las conclusiones finales establecen que la metodología de trabajo es válida y lo interesante que resulta su aplicación a la hora de obtener una buena aproximación of del comportamiento de casas típicas en varias ciudades de distintos estados de EE.UU. Este resultado es de gran utilidad para identificar puntos o regiones donde la energía se usa de forma eficiente, e incluso dar una idea de qué medidas entre las plausibles para mejorar la operación de la red de consumo podrían implementarse.

Finalmente, este proyecto podría, en un futuro, ser de utilidad para la elaboración de una base comparativa más amplia de la que se consiguió, estudiando otros estados de

City Reference Number

4 7 1 8 2 5 3 6

Dissimilarity (%) 0 10 20 30 40 50 60 70

Mon Tue Wed Thu Fri Sat Sun Mon 0,849 0,849 0,851 0,850 0,850 0,850 0,849 Tue 0,8700,879 0,880 0,881 0,879 0,879 0,872 Wed 0,863 0,869 0,873 0,875 0,873 0,872 0,866 Thu 0,865 0,870 0,875 0,877 0,875 0,874 0,868 Fri 0,854 0,860 0,864 0,865 0,864 0,863 0,857 Sat 0,857 0,862 0,865 0,865 0,865 0,865 0,859 Sun 0,847 0,847 0,850 0,849 0,849 0,849 0,848

Hours of the day

3 4 2 5 1 6 7 8 9241011121314151623171822192021

Dissimilarity (%) 0 5 10 15 20 25

Electricity, City 2 - Tuesday

-1 0 1 2 3 4 5 6 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

154 248 388 241 605 732 72 305 285 600 588 727 99 96 435 618 862 254 105 313 685 83 126 139

Hits

Neurons

0 5 10 15 20 25

kWh 0 0.5 1 1.5 2 2.5 3

3.5 Weight Plot

data1 data2 data3 data4 data5 data6 data7 data8

(18)

consumos de electricidad y de gas, o incluso ver donde y cómo se podrían empezar a instalar redes inteligentes.

(19)

(20)

(21)

INDEX

CHAPTER 1: INTRODUCTION ... 5

CHAPTER 2: STATE OF THE ART ... 7

1. Clustering techniques in data mining: Evolution and successful applications ... 7

2. Load Forecasting and Load Profiling ... 8

a. Load Forecasting ... 8

b. Load Profiling ... 8

3. Clustering techniques in the electricity sector ... 9

a. General Uses ... 9

b. Residential Sector ... 10

4. Smart Grids ... 10

5. Neural networks and Self-‐Organised Maps ... 11

CHAPTER 3: MOTIVATION AND OBJECTIVES ... 13

1. Motivation ... 13

2. Objectives ... 13

CHAPTER 4: METHODOLOGY, TOOLS AND RESOURCES ... 15

1. Methodology ... 15

2. Resources: EIA Database ... 15

3. Tools: MATLAB ... 16

CHAPTER 5: DATA ANALYSIS TECHNIQUES ... 17

1. Hierarchical Clustering ... 17

2. Self Organized Maps ... 19

CHAPTER 6: CASE STUDY 1 -‐ Illinois ... 25

1. State Overview ... 25

2. Available Data ... 26

a. Inter-‐city Analysis ... 27

i. Electricity ... 27

ii. Gas ... 28

b. Intra-‐city Analysis ... 29

ii. Gas ... 32

iii. Comments and observations ... 33

4. Self – Organized Maps ... 34

a. Electricity ... 35

b. Gas ... 39

c. Comments and observations ... 43

5. Conclusions ... 43

CHAPTER 7: CASE STUDY 2 –Indiana ... 45

1. State Overview ... 45

2. Available Data ... 46

a. Inter-‐City Analysis ... 47

ii. Gas ... 48

iii. Comments and observations ... 49

(22)

i. Electricity ... 50 ii. Gas ... 52 iii. Comments and observations ... 54 4. Self-‐Organized Maps ... 55 a. Electricity ... 56 b. Gas ... 59 c. Comments and observations ... 62 5. Conclusions ... 63 CHAPTER 8: CASE STUDY 3 – FLORIDA ... 65 1. State Overview ... 65 2. Available Data ... 66 3. Hierarchical Clustering ... 67 a. Inter-‐City Analysis ... 68 i. Electricity ... 68 ii. Comments and observations ... 70 b. Intra-‐City Analysis ... 70 ii. Comments and observations ... 72 4. Self-‐Organized Maps ... 72 a. Electricity ... 73 b. Comments and observations ... 82 5. Conclusions ... 82 CHAPTER 9: CASE STUDY 4 – California ... 85 1. State Overview ... 85 2. Available Data ... 86 3. Hierarchical Clustering ... 87 a. Inter-‐City Analysis ... 88 i. Electricity ... 88 ii. Gas ... 92 iii. Comments and observations ... 95 b. Intra-‐City Analysis ... 95 i. Electricity ... 96 ii. Gas ... 103 iii. Comments and observations ... 106 4. Self-‐Organized Maps ... 107 a. Electricity ... 108 b. Gas ... 124 c. Comments and observations ... 129 5. Conclusions ... 129 CHAPTER 10: CONCLUSIONS ... 131 1. States Overview – Comparison ... 131 2. Available Data – Comparison ... 131 3. Hierarchical Clustering – Comparison ... 131 a. Inter-‐City Analysis ... 131 i. Electricity ... 131 ii. Gas ... 132 b. Intra-‐City Analysis ... 132 i. Electricity ... 132

(23)

c. Comments and observations ... 133 4. Self-‐Organized Maps -‐ Comparison ... 134 a. Electricity ... 134 b. Gas ... 136 c. Comments and observations ... 136 5. Final Conclusions ... 137 REFERENCES ... 139 BIBLIOGRAPHY ... 140

(24)

(25)

CHAPTER 1: INTRODUCTION

Improving energy efficiency and management is one of the most important challenges the electrical sector faces and is trying to tackle. Thus knowing how residential customers use electrical energy is fundamental when it comes to make decisions in areas such as market strategy, satisfaction of demand or sizing grids’ capacities amongst others.

Optimizing the management of electricity supply and demand is a complex matter but turns out to be indispensable to efficiently operate any network or grid and consequently achieve a more stable and sustainable power transmission system.

Carrying out a good data analysis is a key aspect when it comes to searching for solutions to improve and facilitate the daily operations of the electrical grid.

This is even more important in the context of deregulated energy markets, where both consumers and suppliers can freely intervene. As they are able to keep track of their energy consumption, consumers can decide to take part in the retail market while, based on the same information, suppliers can develop electricity tariffs or make decisions in regard with load management.

Bearing this background in mind, this project aims at applying clustering analysis on energy-‐related sets of data in order to achieve a coherent electricity and gas consumption profiling.

Specifically, periodical load data from the residential sector of several states the U.S.A are used as a case study. The analysis of such database will try to establish weekly electrical energy and gas consumption profiles and the influence of gas consumption in the process, individually, for each of the selected states.

The identification of such energy use pattern profiles enables the assessment of new efficient solutions and also determine which type of consumers are actually making an efficient use of the electrical energy.

Therefore, profiling results and ideas to develop new decision-‐making processes would be reported to power grid operators. This may allow the implementation of new management policies that could consequently improve management and sustainability of the network.

Likewise, instructions leading to sensible use of energy at a reasonable cost are important for electric power companies and society as well as for individual customers.

(26)

(27)

CHAPTER 2: STATE OF THE ART

1. Clustering techniques in data mining: Evolution and successful applications

Clustering defines the task of creating groups of abstract objects into meaningful sub-‐classes of similar objects. Once identified into these sub-‐classes, also known as clusters, all the data objects that belong to the same cluster can eventually be treated the same. This technique plays an important role when it comes to data mining, which turns out to be necessary when trying to find comprehensible structures in a large amount of data sets. Unlike classification processes, the classes are to be found and were not pre-‐defined. Fig.1.1 generically illustrates the process:

Figure 1.1

High quality clusters are those in which the intra-‐cluster similarity is high while the inter-‐class similarity is low, as shown in Fig. 1.2:

Figure 1.2

A successful application of clustering (that is, high quality) will provide a good basis for further data evaluation interpretation, or structure. More precisely, it also allows discovering patterns with regard to large data sets, which, is what data mining consist of. Data mining is the key to extracting as much relevant information as possible from the data sets.

The first known use of clustering is believed to have taken place in London in 1854. As a result, John Snow, English physician, managed to put an end to a cholera outbreak. He basically plotted reported cases of the disease on a special map and found a link between disease density and the particular location of a well pump in the city. Once the pump was removed, the epidemic came to an end.

(28)

However, even though some applications of clustering and data mining in the electricity sector have been developed and new are being studied, there is still much to investigate. This is especially the case in what regards finding solutions to improve the operation of the complex networks that this sector encloses in the presence of a deregulated market.

2. Load Forecasting and Load Profiling

a. Load Forecasting

The decision making-‐process in the electricity sector and network encloses several levels that must be taken into account. From day to day operation of power plants to the planning of facilities, decisions must be taken accordingly to daily circumstances of the demand curve: This is a complex task, mainly because of the deregulation of energy markets.

Therefore using artificial intelligence methods to get to a coherent approach and model to manage the power grid is fundamental.

Over the past years, the prediction of future load demands, also known as load forecasting, has established itself as a core process when it comes to address the daily tasks and challenges involved in the operation of the electrical system. Electric utilities or energy providers use it to predict the amount of power needed at every moment to meet the demand and supply equilibrium [1].

However, even though various methods have been studied and developed, a global approach has not yet been found. Due to the variability of demand patterns, addressing smaller groups and individual case studies is usually applied.

b. Load Profiling

A load profile is a graph that represents the variation of the electrical load versus time. They vary depending on the sector or customer they characterize and also with other external factors such as climate or seasons of the year.

Many types of profiles can be searched for depending upon criteria of profiling, types of application, innovative ways to operate the grid, etc.

Many researches have contemplated the determination of daily load profiles to characterize the electricity usage of households.

Fig.1.3 illustrates an example of load profiling of a typical weekday in a 3-‐ bedroom household in the Brig o’Doon community (Scotland, UK) [2]

(29)

Figure 1.3

There is not a unique method to tackle load profiling. However hierarchical clustering is usually preferred to achieve the grouping of similar households together. That results in a reduced number of archetypal daily load profiles that enclose several households and which are later on used as a target for intervention on consumers’ behaviour.

3. Clustering techniques in the electricity sector

a. General Uses

As there are different types of clustering techniques, several of them have been assessed for different case studies, specifically in the electrical sector. Each of these techniques involves different mathematical algorithms. Therefore, size or data type greatly influence the election of one technique or another as they all have different characteristics.

Hierarchical clustering, as stated before, has many times been used to achieve load profiling and can also be used in combination with other algorithms.

In that direction, the collaboration between power engineers and mathematicians from several European countries resulted in a study named “Hierarchical Spectral Clustering of Power Grids” [3] that aims at solving practical power engineering problems and model substructures of a whole transmission network.

However, several other techniques apply for case studies in the electrical sector. K-‐means clustering algorithm is a partitioning technique (as opposed to hierarchical clustering it will result in a unique separation of the data set instead of a hierarchy where adequate clusters must later on be selected). Recently, it was proven to be very useful for data mining into the power system in a case study of a student of the North China’s Electric Power University (Beijing) [4]. The results of this analysis show that in a context where power data is generated every second, the algorithm was very efficient to analyse the power load of customers.

(30)

Last but not least, Fuzzy C-‐means is a well-‐known clustering algorithm when it comes to load profile determination in a deregulated environment. Several studies in the UK (University of Strathclyde) [5], in Malaysia [6][7] or Bangladesh [8] have studied this method’s validity in a deregulated environment.

b. Residential Sector

In the UK, several studies on the applications of clustering in electricity use load profiling in the residential sector must be highlighted.

For instance, students of the University of Nottingham found out that describing the changes in consumers’ behaviours by means of patterns obtained with clustering algorithms leads to more consistent groupings of households. This was reported in their paper: “Variability of Behaviour in Electricity Load Profile Clustering; Who Does Things at the Same Time Each Day?”[9].

Furthermore, these students are now researching how to best define representative load profiles for domestic electricity users in the UK and have published the first results of the study in their article: “The application of a data mining framework to energy usage profiling in domestic residences using UK data”[10].

In the past few years, new methods involving clustering techniques have been implemented and tested in the residential sector of other countries.

One particular example is “A New Proposal of Typification of Load Profiles to Support the Decision-‐Making in the Sector of Electric Energy Distribution” from the Polytechnic school of the Federal University of Bahia (Brazil) [11]. In this work, by combining the selection, classification and clustering of load curves, crucial features of a load curve of residential users that also considered seasonal and temporal aspects was obtained.

The implementation of this method by a Brazilian Electric Company in the context of an energy efficiency program was a success, as, for instance, the typing method allowed to analyse the impact of changing refrigerators in some low-‐income cities in the state of Maranhão.

A further analysis on the study proved that this new method actually resulted in a greater diversity of patterns than more traditional methods such as Fuzzy C-‐Means.

4. Smart Grids

Results of load profiling in the residential sector may later on contribute to the implementation of smart grids. Smart grids are technologically advanced grids that, by means of additional monitoring, control and communication activities, enable moving electricity around the system in a more economical and efficient way. Basically, they maximize the throughput of the system and reduce the energy consumption at the same time.

(31)

In this context, the USA has established support for the smart grid as federal policy, with the Energy Independence and Security Act of 2007.

5. Neural networks and Self-‐Organised Maps

More recently, other thechniques related to a different kind of clustering were also studied to find if they could be useful for decision-‐making processes in the electrical sector.

Amongst these, neural networks have shown interesting results in what regards power grid simulations, which is reflected in the study “The Application of Neural Networks to Electric Power grid Simulation” [12], while Self-‐Organizing Maps techniques were assessed in the paper “Electricity Load Forecasting using Self Organising Maps”[13].

(32)

(33)

CHAPTER 3: MOTIVATION AND OBJECTIVES

1. Motivation

This project’s motivation is to verify if obtaining weekly electrical use consumption pattern profiles by means of hierarchical clustering is viable and, if, later on, may supply sufficient and useful information to improve the electrical network’s operation and management.

This implies applying hierarchical clustering to databases that cover multiple big-‐ scale subjects of study.

Thus the extent of the study is considerable, as it targets, individually, whole states within which inter and intra city analysis are both contemplated.

Results of this analysis may, later on, contribute to the upgrade of the grid to a smart grid, which is actually regarded as an important advance to the future operation of any electrical system.

2. Objectives

The first objective of this analysis is to apply hierarchical clustering techniques in order to find possible profiles of use of electrical energy by different types of customers within the residential sector of the U.S.A. Clustering will be individually applied to databases of several states, provided by the Energy information Administration of the U.S.A. [14], previously selected. For each state, data of available cities will first be studied for each city (intra-‐city) and then common patterns amongst cities will be looked for (inter-‐city).

Then, the identification of a reduced number of typical weekly profiles that explain the consuming behaviour of customers that characterize a state will be searched for.

Also, the influence and relation between gas and electricity consumption will also be studied with clustering techniques and information research.

Finally, the ultimate objective of the project is to come up with innovative decision-‐making processes based on the different types of profiles previously established. This may allow the establishment of policies that once implemented by giving instructions to customers would improve and facilitate a more optimal operation of the system.

(34)

(35)

CHAPTER 4: METHODOLOGY, TOOLS AND RESOURCES

1. Methodology

A. Preliminary Study.

This stage contemplates the selection of the states that the case study will focus on. Once they have been chosen, general information on each of them is searched for in order to establish some kind of background and introduce their energy profile. This information may be energy-‐related (e.g.: Importer/Exporter, dominant primary energy resources, de/regulated energy market or not) or non-‐energy related (e.g.: Location in the U.S.A, Climate)

B. Selecting and organizing data sets

The selecting part consists in selecting the adequate sets of data from the whole energy use database provided by the U.S Energy Information Administration (EIA). The organizing part involves putting all data sets in the correct matrix format that the further application of clustering algorithms requires.

C. Compare and study intra-‐city consumption: Electricity/Gas

This consists in the deduction of energy profiles for each city by trying to group days. Basically clustering will be individually applied to each city and establish a hierarchy of similarity amongst the twenty-‐four hours of each day of the week.

D. Compare and study inter-‐city consumption: Electricity/Gas

This consist in the deduction of energy profiles within the whole state by trying to group individual cities’ profiles previously established.

E. Complete time logical models

Once point 3 and 4 have been completed, all relevant information will be arranged in archetypal profiles (e.g. City C1, hours 1-‐6 leads to Consumption type A)

F. Conclusions and decision-‐making process for clients

This part will analyse which archetypal profiles efficiently use energy and which may have to improve. It will also consider what kind of instructions the operator of the system could give to customers in order to gain efficiency.

2. Resources: EIA Database

Sets of data used for the case study are the ones published by the U.S Energy Information Administration (EIA) [14]. These sets of data are presented as “Column Separated-‐Value” (csv) files. The database contains csv files for variable numbers of cities of all American states. Each csv file contains hourly energy consumption information for a whole year (8760 hours). There are several excel files for each city, depending upon the type of energy-‐related information (Sector, Type of Energy, etc). In this case, the residential Data Load files for electricity and gas were the useful ones for the case study. It must also be pointed out that each data file contains the information of a typical residential house of the city and not the whole city itself.

(36)

Fig. 1.4 below illustrates the csv file once turned into a standard matrix file. The data sample corresponds to the city of Aurora (Illinois).

Figure 1.4

Columns are described here below: A. Date

B. Electricity Facility Total (kWh) C. Gas Facility Total (kWh) D. Heating Electricity (kWh) E. Heating Gas (kWh) F. Cooling Electricirt

G. HVAC Fan + Fans Electricity (kWh) H. Electricity HVAC (kWh)

I. Fans Electricity (kWh)

J. General Interior Lights Electricity (kWh) K. General Exterior Lights Electricity (kWh) L. Apl Interiour Equipment Electricity (kWh)

M. Miscellanous Interiour Equipment Electricity (kWh) N. Water Heater Water Systems (kWh)

Columns A, B and C were the ones useful for the case studies.

3. Tools: MATLAB

The main tool used to carry out the project is MATLAB, a high-‐level language and interactive environment for numerical computation, visualization, and programming.

This software basically allows processing and analysing data sets by means of mathematical tools, algorithms and developed codes.

(37)

CHAPTER 5: DATA ANALYSIS TECHNIQUES

Several mathematical techniques have been used to apply clustering and analyse data sets.

1. Hierarchical Clustering

Hierarchical clustering groups data into multilevel hierarchy of clusters by means of a cluster tree, also called dendogram. There are therefore several scales of clustering as clusters are joined with other clusters at the next level.

Here below, a brief example explains what is a dendrogram and how to interpret it (Fig. 5.1).

Figure 5.1

• Bottom clusters are the most similar and dissimilarity increases while going up the Y-‐

axis of the graph

• Objects grouped in pairs at the lowest level then in pairs of clusters of the previous

level

• The X axis describes the indices associated to each of the studied objects (for

instance cities)

• The Y axis represents relative dissimilarity

• Here a maximum dissimilarity of 55% is considered, there would be four clusters

(each associated to a different colour in the graph.)

Now applying this to the sets of data of the case studies of the project, the electricity and gas sets of data for each city are 8760x1 matrices with the corresponding energy consumption for each hour of the year.

The hierarchical clustering algorithm basically follows the following steps in Matlab:

(38)

i. Compute the Euclidean distance between objects of the Energy Data Matrix into a dissimilarity matrix

To compute the Euclidean distance between pairs of objects in m-‐by-‐n data matrix X the function pdist is used. Rows of X correspond to observations, and columns correspond to variables. D is a row vector of length m(m–1)/2, corresponding to pairs of observations in X. The distances are arranged in the order (2,1), (3,1), ..., (m,1), (3,2), ..., (m,2), ..., (m,m–1)) in D, with D being the dissimilarity matrix.

ii. Group the objects into a binary, hierarchical cluster tree.

Then, the function linkage is used to obtain a matrix that encodes a tree of hierarchical clusters of the dissimilarity matrix.

Basically, it group objects into the relevant clusters once the proximity between all of them has been established. Thanks to the distance information obtained from the previous step (computed in the dissimilarity matrix), the linkage function links pairs of objects that are close together into binary clusters. Then, these newly formed clusters are linked to each other again and again to create bigger clusters until all the objects in the original data set are linked together in a hierarchical tree matrix.

In this case the single linkage also known as nearest neighbor method is applied. This method uses the smallest distance between objects in the two clusters to form the next cluster.

iii. Plot the dendogram that illustrates the hierarchical tree matrix

The hierarchical, binary cluster tree created by the linkage function is most easily understood when viewed graphically. To do so, the dendogram plot is generated from the hierarchical tree matrix with the dendogram function.

iv. Check Dissimilarity

Finally, checking how well the obtained dendogram graphically reflects the information obtained in the matrices of the algorithm needs to be done. The heights of the hierarchical tree between objects are called cophenetic distances.

One way to see how well the cluster tree generated by the linkage function reflects the original data is to compare the cophenetic distances with the original distance data generated by the pdist function. To do so, the cophenet

function is used and returns the cophenetic correlation coefficient between the dendogram and the hierarchical tree matrix. The closer the value of the cophenetic correlation coefficient is to 1, the more accurate the clustering solution.

(39)

The implementation of the algorithm in Matlab is done with the following code:

Figure 5.2

2. Self Organized Maps

The Self-‐Organized Map (SOM) is one of the best-‐known self-‐organized neural network models.

Self-‐Organized learning does not require supervision or the user’s knowledge/intervention during the process. It consists in repetitively modifying synaptic weights of the network as a response to activation models and according to predetermined rules until reaching a final weight configuration. This final configuration will be stable face to any kind of stimulus. Therefore, at this point, behavior patterns have been acquired.

This idea is based on Turing’s observation (1952): ”global order can be achieve through local interactions”

%X matrix: Information of the 10 objects in columns

X = [Co1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10];

%Transposition of the matrix to apply the function; X=X'

%Function that calculates Euclidean Distances Y = pdist(X,'euclidean');

%Column vector that classifies distances from smallest to biggest V=sort(Y(:));

%First value = minimum distance min=V(1);

%Dimension of the distance matrix s=size(Y);

%Last value of the ordered distance matrix max=V(s(2));

%Correction of the Y matrix before plotting the dendrogram Y=((Y-min).*100)/(max-min);

%Linkage and generation of the dendrogram; Z = linkage(Y);

[H,T] = dendrogram(Z,'colorthreshold','default'); set(H,'LineWidth',2);

%Checking Dissimilarity with cophenetic coefficient; cophenetic = cophenet(Z,Y)

(40)

Self-‐organization can be applied to the way the human brain works as well as to artificial neural networks. A network organizes itself at two different, interacting levels:

• Activity: Responses resulting after applying particular stimuli.

• Connectivity: Connection forces of the network will change depending on the type of model that causes its activity.

Neurobiological studies indicate that different sensory inputs (motor, visual, auditory, etc.) are mapped onto corresponding areas of the cerebral cortex in an orderly fashion.

Therefore, neural network models basically result in self-‐organized networks that would behave in a neurologically inspired manner.

SOM can be used to cluster data without knowing much about the input data or to detect patterns, which is useful for the purpose of this project.

In such case, all neurons then organize themselves into a network that is represented in form of a map: the SOM. Such map has two important properties

• At each stage of representation, or processing, each piece of incoming information is kept in its proper context/neighborhood.

• Neurons dealing with closely related pieces of information are neighbor neurons in the map

Regarding what neurons represent, it can be said that an output neuron of the map corresponds to a particular feature drawn from the input space.

The following example illustrates a simple application of how self-‐organized maps work (Fig 5.3) [15]

(41)

Suppose we have four data points (crosses) in our continuous 2D input space, and want to map this

onto four points in a discrete 1D output space. The output nodes map to points in the input space (circles). Random initial weights

start the circles at random positions in the centre of the input

space.

We randomly pick one of the data points for training (cross in circle). The closest output point represents

the winning neuron (solid diamond). That winning neuron is moved towards the data point by a

certain amount, and the two neighbouring neurons move by

smaller amounts (arrows).

Next we randomly pick another data point for training (cross in circle). The closest output point gives the new winning neuron (solid diamond). The winning neuron moves towards the data point by a certain amount, and the

one neighbouring neuron moves by a smaller amount (arrows).

We carry on randomly picking data points for training (cross in

circle). Each winning neuron moves towards the data point by a

certain amount, and its neighbouring neuron(s) move by

smaller amounts (arrows). Eventually the whole output grid

unravels itself to represent the input space.

Figure 5.3

(42)

The following code and figure (Figs. 5.4 and 5.5) illustrate how this technique can be implemented in Matlab:

Figure 5.4

% Load Data and determination of input matrix

load('Cities.mat'); input=EnergyData';

% Dimensions of the map

dimension1 = 6; % dimension X

dimension2 = 4; % dimension Y

% Structure that facilitates plotting

load('sm');

sm.topol.msize=[dimension2 dimension1];

% Definition of map parameter

iteraciones = 2000; radio_v=3;

top='gridtop';

net = selforgmap([dimension1 dimension2], iteraciones,radio_v, top);

%Learning factor

net.trainParam.lr = 0.6;

%Training the network

[net,tr] = train(net,input);

%Weight definition, axis and plot

pesos= net.IW{1};

t=1:dimension1*dimension2;

figure(10); plot(t, pesos(:,1), t, pesos(:,2), t, pesos(:,3), ...

t, pesos(:,4), t, pesos(:,5), t, pesos(:,6), t, pesos(:,7), ...

t, pesos(:,8), t, pesos(:,9), t, pesos(:,10), ...

t, pesos(:,17), t, pesos(:,18), t, pesos(:,19))

%Reordering weight positions for plotting purposes

pesos1=pesos; m=0;

for j=1:dimension2

for i=1:dimension1

ff = pesos(i*dimension2 - m,:,:); pesos1(i+dimension1*m, :, :) = ff;

end

m=m+1;

end

% Actualization of reordered weights

sm.codebook= pesos1;

% Plotting functions

figure(6) T=pesos1;

som_barplane(sm, pesos, '', 'unitwise');

title('SOM\_bars') figure(7)

som_pieplane(sm, pesos1); title('SOM\_pie')

figure(8)

som_plotplane(sm, pesos1, 'b');

title('som\_representantes')

(43)

Figure 5.5

(44)

(45)

CHAPTER 6: CASE STUDY 1 -‐ Illinois

1. State Overview

Illinois is a Mid-‐Western American State that is also considered a microcosm of the U.S.A. It is the fifth most populated State of the U.S.A and also ranks fifth in what regards Gross domestic product. Thanks to its location, access to major waterways, rail and aviation spotlights. Illinois is a major transportation hub. This is key when it comes to the transportation of crude oil and natural gas throughout North America. Thus this State highly contributes to the Nation’s economy.

Regarding Energy-‐related aspects, Illinois is an important energy consumer even though its per capita energy consumption is slightly below the national average. Its electricity and gas markets are deregulated which makes it a suitable state for this project.

In terms of electricity generation, Illinois is the leader of nuclear power generation in the U.S.A with around one eight of its total generation. It represents half of Illinois’ total generation while the rest mostly comes from coal-‐fired power plants. Illinois is a State that generates considerably more electricity than it consumes and that is served by two different grids. The first one encloses the Northern part of the State and interconnects with the Mid-‐Atlantic States while the second one encloses the Southern part of the State and interconnects with the Mid-‐ continent states.

In what regards natural gas, Illinois has few producing wells and a minimal production. However, it is a major crossroad more than a dozen interstate natural gas pipelines and two natural gas market centers. It also has the second largest natural gas storage capacity of the U.S.A.

(46)

The residential sector is the most important consumer of natural gas as around four-‐fifths of the homes in Illinois use it for heating.

2. Available Data

The EIA provided information for 19 cities in the whole state, which were all taken into account for the case study. In order to facilitate the identification of those cities when applying mathematical algorithms they were classified in alphabetical order and given a reference number.

The following table enumerates the cities and gives their location on the map of Illinois.

List of cities and

reference number Map of Illinois with numbered cities

1) Aurora

2) Belleville-‐Scott

3) Bloomington

4) Cahokia

5) Carbondale

Southern

6) Chicago-‐Midway

7) Chicago-‐Ohare

8) Decatur

9) DuPage

10)Marion-‐

Williamson

11)Moline-‐Quad City

12)Mount Vernon

13)Peoria 14)Quincy

15)Rockford

16)Springfield 17)Sterling Rock

18)University of

Illinois

19)Waukegan

Colour code according to location:

Northern Cities

Southern Cities

Table 6.1

(47)

3. Hierarchical Clustering

The first objective of this analysis is to get an idea of behaviour similarities or dissimilarities between all cities of the State of Illinois. For instance, if there are some clear groups of cities that follow similar behaviour patterns, if they do, to what extent and even if they are close together geographically or not within the state. That would be the inter-‐city analysis, and its relevant result is a dendrogram that contains all cities of the state and groups them into subclusters according to their dissimilarity level.

Once this has been done, and bearing in mind the results of the inter-‐city analysis, each city is going to be analysed individually with the intra-‐city analysis. The intra-‐ city analysis results will consist of a 7x7 matrix for each city and seven dendrograms. The 7x7 matrix is a cophenetic correlation coefficient matrix and there is one dendrogram for each seven days of the week and the weekends (i.e.: one for all Mondays, one for all Tuesdays, etc)

The cophenetic correlation matrix pursues seeing if the hierarchical clustering technique reliably describes the data of each day (diagonal terms with value above 0.8) and see how days are similar between themselves (non-‐diagonal terms, days are considered to behave in a similar way when value is above 0.8)

The dendrogram of each day basically groups the 24 hours of the day into subclusters according to their dissimilarity level and allows analysing and describing a given day (i.e: Typical Monday of a given city).

All of the previously described processes will be applied to electricity and gas consumption databases separately.

a. Inter-‐city Analysis

Here hierarchical clustering was applied to all 19 cities of the state of Illinois (8760x19 matrix) first to electricity consumption data then to gas-‐consumption data. The following dendrograms were obtained.

The X-‐axis indicates the reference number of each city as previously established while the Y-‐axis indicates the relative dissimilarity between all binary clusters.

i. Electricity

In the following dendrogram (Fig. 6.1) two clusters corresponding to the blue and red colours of the figure can be observed. The cities of the red cluster (2,4,5,10,12) correspond to the Southern cities of Illinois while the Northern cities all belong to the blue cluster (the rest).

Cluster analysis of electricity and gas consumption profiles in the residential sector of the U.S.A

ESCUELA TECNICA SUPERIOR DE INGENIERIA (ICAI)

GRADO EN INGENIERIA ELECTROMECANICA

Especialidad Eléctrica

Cluster Analysis of Electricity and Gas

Consumption Profiles in the Residential

Sector of the U.S.A

ESCUELA TECNICA SUPERIOR DE INGENIERIA (ICAI)

GRADO EN INGENIERIA ELECTROMECANICA

Especialidad Eléctrica

Cluster Analysis of Electricity and Gas

Consumption Profiles in the Residential

Sector of the U.S.A

INDEX

CHAPTER 1: INTRODUCTION

CHAPTER 2: STATE OF THE ART

CHAPTER 3: MOTIVATION AND OBJECTIVES

CHAPTER 4: METHODOLOGY, TOOLS AND RESOURCES

CHAPTER 5: DATA ANALYSIS TECHNIQUES

CHAPTER 6: CASE STUDY 1 -­‐ Illinois

CHAPTER 6: CASE STUDY 1 -‐ Illinois