Cluster analysis of electricity and gas consumption profiles in the residential sector of the U.S.A


Texto completo








Especialidad  Eléctrica  



Cluster  Analysis  of  Electricity  and  Gas  

Consumption  Profiles  in  the  Residential  

Sector  of  the  U.S.A  



Autor:  Laura  GELABERT  COSTACURTA   Director:  Miguel  Ángel  Sanz  Bobi  


Madrid   Junio  2016  








Especialidad  Eléctrica  



Cluster  Analysis  of  Electricity  and  Gas  

Consumption  Profiles  in  the  Residential  

Sector  of  the  U.S.A  



Autor:  Laura  GELABERT  COSTACURTA   Director:  Miguel  Ángel  Sanz  Bobi  

            Madrid   Junio  2016  



Author: Gelabert Costacurta, Laura

Director: Sanz Bobi, Miguel Ángel

Collaborating Institution: ICAI – Universidad Pontificia Comillas


The aim of this project is to study how a particular combination of cluster analysis techniques can work when it comes to analyse energy-related sets of data and obtain coherent archetypal profiles.

Figure 1

Improving energy efficiency and management is one of the most important challenges the electrical sector faces and is trying to tackle in order to achieve stable and sustainable power transmission systems. Also, in the context of deregulated energy markets, a correct analysis and interpretation of information is crucial to address supply and demand management. Bearing this background in mind, periodical load data from the residential sector of several states the U.S.A are used as a case study.

Figure 2

The state of the art basically regards the development and usefulness of several clustering techniques, first in data mining processes and then in more specific ones related to the energy sector. For instance, load forecasting and load profiling (Fig. 2) are permanently used methods in the electrical sector and their improvement through clustering techniques has been well investigated. Other uses in this sector as well as residential-related studies must also be mentioned. Also, smart grids, the actual interest that has been put in them, as well as researches that involve neural networks and self-organized maps are the most recent aspects must also be contemplated in this context.


The objectives of the project are to discover archetypal consumption profiles and patterns from data through clustering techniques and to study the relationship between electricity and gas consumption. Also, the ultimate objective of the project is to come up with ideas for innovative decision-making processes based on the different types of profiles previously established. This may allow the establishment of policies that, once implemented by means of instructions to customers, would improve and facilitate a more optimal operation of the system.


Figure 3

The project has been developed using MATLAB as main tool. Sources of information were some public datasets from the American Energy Information Administration including typical households use of electrical and gas energies across the whole Nation. There is usually one set of data for several cities within each state that contains consumption values for 8760 hours of a whole year (Fig. 3). Basically, for each case study, after obtaining the preliminary information regarding the state, relevant data sets are organized and put into the correct format for the further application of algorithms. Then an inter-city analysis, comparing cities between themselves within the state, and an intra-city analysis, that individually studies each of them are performed. This is done for electricity as well as gas consumption and once results are analysed and compared archetypal profiles and time logical models are established for the state. The last step consists of conclusions and decision making-processes orientated to consumers.

A combination of two particular clustering techniques was used for all the data analysis of the project. The first technique is the hierarchical clustering while Self-Organised Maps (SOMs) constitute the second one.

Case Studies develop and expand how the analysis was performed on energy data sets within four different states of the U.S.A. Concretely; these were Illinois, Indiana, Florida and California. Each state constitutes a case study where after a brief introduction and overview of the available data, hierarchical clustering and self-organised maps were applied, for both electricity and gas consumption analysis.

Hierarchical clustering (Fig. 4) was first used to perform the inter-city analysis where its resulting dendrogram gave an insight of similar cities within the state and if it implied some kind of relation with their geographical location. Then it was used for the intra-city analysis, for each city of the state, individually. There, the calculation of a cophenetic correlation coefficient matrix allowed to see which days of the week had a similar behaviour and how reliable was the information obtained for each day. Also, another dendrogram allowed analysing and establishing weekly archetypal profiles.


Inter-city dendrogram Cophenetic correlation coefficient matrix Intra-city dendrogram

Figure 4

Self-Organized maps (SOMs) were used to check or complement results obtained with hierarchical clustering as it gives a conjunct graphic view of inter- and intra-city analysis of the case study. Three plots were used to illustrate results (Fig. 5). The hits plot, that shows the distribution of the input information within each output neuron, allows to evaluate the accuracy of the SOM: the better the distribution, the more accurate and reliable the SOM. The bar graph shows relative information between all input cities within each output neuron and the weight plot represents the evolution of each city from the first to the last neuron but this time with an absolute energetic scale.

Hits Plot Bar Graph Weight Plot

Figure 5

To conclude, all case studies are finally compared and contrasted thanks to the selection of relevant results and analysis obtained. This has allowed giving a general evaluation of the whole project. Final conclusions establish the validity of the methodology and how interesting its application can be when it comes to obtain a good approximation of the overall behaviour of customary households of various cities within states of the U.S.A. This result is useful in order to identify where energy is efficiently used, give an idea of which measures that could help grids management and operation could be implemented.

Finally, this project may, later on, be useful to expand the comparative basis that was started with other states of the U.S.A, research the interaction between electricity and gas consumption behaviours, or see where and how smart grids could start to be installed.

City Reference Number

4 7 1 8 2 5 3 6

Dissimilarity (%) 0 10 20 30 40 50 60 70

Mon Tue Wed Thu Fri Sat Sun Mon 0,849 0,849 0,851 0,850 0,850 0,850 0,849 Tue 0,8700,879 0,880 0,881 0,879 0,879 0,872 Wed 0,863 0,869 0,873 0,875 0,873 0,872 0,866 Thu 0,865 0,870 0,875 0,877 0,875 0,874 0,868 Fri 0,854 0,860 0,864 0,865 0,864 0,863 0,857 Sat 0,857 0,862 0,865 0,865 0,865 0,865 0,859 Sun 0,847 0,847 0,850 0,849 0,849 0,849 0,848

Hours of the day

3 4 2 5 1 6 7 8 9241011121314151623171822192021

Dissimilarity (%) 0 5 10 15 20 25

Electricity, City 2 - Tuesday

-1 0 1 2 3 4 5 6 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

154 248 388 241 605 732 72 305 285 600 588 727 99 96 435 618 862 254 105 313 685 83 126 139



0 5 10 15 20 25

kWh 0 0.5 1 1.5 2 2.5 3

3.5 Weight Plot

data1 data2 data3 data4 data5 data6 data7 data8



Autor: Gelabert Costacurta, Laura

Director: Sanz Bobi, Miguel Ángel

Entidad Colaboradora: ICAI – Universidad Pontificia Comillas


Este proyecto pretende estudiar cómo una combinación particular de técnicas de clustering pueden funcionar a la hora de analizar bases de datos de consumo energético y obtener perfiles patrones coherentes.

Figura 1

Mejorar la gestión de la energía así cómo alcanzar ciertos niveles de eficiencia energética es uno de los retos más importantes a los que se esta enfrentando el sector eléctrico en vistas de conseguir sistemas de transmisión de energía estables y sostenibles. Además, en el contexto de mercados energéticos desregulados, un buen análisis de datos y una correcta interpretación de resultados es esencial para encargarse tanto de la gestión de la oferta cómo de la satisfacción de la demanda en este sector. Teniendo en cuenta este contexto, datos periódicos de carga energética del sector residencial de varios estados de EE.UU han sido utilizados como casos de estudio.

Figura 2

El estado del arte, básicamente concierne el desarrollo y la utilidad de distintas técnicas de clustering : primero en procesos de minería de datos y luego en contextos más concretos, relacionados con el sector energético. Por ejemplo, la predicción de cargas y el establecimiento de perfiles de carga (Fig. 2) se usan constantemente en el sector eléctrico y muchos estudios investigaron su mejora gracias a la utilización de técnicas de clustering. Estas técnicas se incluyeron en muchas más aplicaciones de


debe de ser mencionados. Además, la Redes Eléctricas Inteligentes, el reciente y creciente interés que se ha puesto en ellas, así cómo investigaciones que relacionadas con redes neuronales y mapas auto-organizados son los aspectos más recientes que se contemplan.

Los objetivos del proyecto persiguen descubrir si existen perfiles patrones y características de consumo a partir de datos y mediante técnicas de clustering. Pero, el objetivo último del proyecto es conseguir ideas de procesos de toma de decisiones innovadores en base a los distintos perfiles previamente definidos. Esto permitiría elaborar unas políticas de consumo, que, una vez implementadas por medio de instrucciones a consumidores, mejorarían y facilitarían una operación más optima del sistema.


Figura 3

El proyecto se llevó a cabo con MATLAB cómo principal herramienta. Las fuentes de información son conjuntos de datos públicos de la “Energy Information Administration” americana que incluyen usos típicos de electricidad y gas de hogares en todo el país. Normalmente se dispone de una colección de datos para varias en cada estado, que contiene valores de consumo para las 8760 horas de un año entero (Fig. 3). Para cada caso de estudio, después de haber obtenido información preliminar sobre el estado, los datos relevantesse ordenan y se ponen en el formato correcto para que se les pueda aplicar los algoritmos necesarios. A continuación, se realizan un análisis inter-ciudad, que compara todas las ciudades de un estado entre ellas, y un análisis intra-ciudad, que estudia cada una de ellas individualmente. Este proceso se aplica tanto a consumo de electricidad cómo de gas. Una vez los resultados analizados y comparados, perfiles patrones y modelos lógicos temporales se establecen para cada estado. El último paso consiste en la deducción de conclusiones and procesos de toma de decisiones orientados a clientes.

La combinación de dos técnicas de clustering en particular se aplicó para el análisis de todos los datos del proyecto. La primera técnica es el clustering jerárquico mientras que los mapas auto-organizados constituyen la segunda.

Los casos de estudio exponen y detallan cómo se realizo el análisis sobre la colección de datos de consumo energético en cuatro diferentes estados de EE.UU. Precisamente, estos fueron Illinois, Indiana, Florida y California. Cada estado constituye, individualmente, un caso de estudio. En cada uno, después de una breve introducción y un repaso de los datos disponibles, se aplica el clustering jerárquico y los mapas auto-organizados, para analizar tanto el consumo de electricidad cómo de gas

El clustering jerárquico (Fig. 4) se usa primero para el análisis inter-ciudad, donde el dendrograma resultante ofrece una percepción de las ciudades similares entre si


ciudad para cada ciudad del estado, de forma individual. Ahí, el cálculo de una matriz de coeficientes de correlación cophenetica permite ver qué días de la semana tienen un comportamiento similar y el nivel de fiabilidad de la información obtenida para cada día. Además, otro dendrograma permite establecer y analizar perfiles patrones semanales.

Dendrograma Inter-ciudad Matriz de coeficiente de correlación cophenetica

Dendrograma Intra-ciudad

Figura 4

Los mapas auto-organizados se usan para comprobar y complementar los resultados obtenidos con el clustering jerárquico, ya que permiten visualizar de forma gráfica y conjunta el análisis tanto inter- cómo intra-ciudad de cada caso de estudio. Tres gráficas se utilizan para mostrar los resultados obtenidos (Fig. 5). El “hits plot”, que muestra como se distribuye la información de entrada dentro de cada neurona de salida, permite evaluar la veracidad del mapa. En cuanto más homogénea sea distribución, más fiable y exacto es el mapa auto-organizado. El “bar graph” muestra la información de todas las ciudades de entrada en cada neurona de salida de forma relativa y el “weight plot” representa la evolucion de cada ciudad desde la primera hasta la última neurona pero esta vez con escala energética absoluta.

Hits Plot Bar Graph Weight Plot

Figura 5

Para concluir, todos los casos de estudio han sido comparados y contrastados después de haber seleccionado todos los resultados y análisis relevantes que se habían obtenido. Esto ha permitido realizar una evaluación general de todo el proyecto. Las conclusiones finales establecen que la metodología de trabajo es válida y lo interesante que resulta su aplicación a la hora de obtener una buena aproximación of del comportamiento de casas típicas en varias ciudades de distintos estados de EE.UU. Este resultado es de gran utilidad para identificar puntos o regiones donde la energía se usa de forma eficiente, e incluso dar una idea de qué medidas entre las plausibles para mejorar la operación de la red de consumo podrían implementarse.

Finalmente, este proyecto podría, en un futuro, ser de utilidad para la elaboración de una base comparativa más amplia de la que se consiguió, estudiando otros estados de

City Reference Number

4 7 1 8 2 5 3 6

Dissimilarity (%) 0 10 20 30 40 50 60 70

Mon Tue Wed Thu Fri Sat Sun Mon 0,849 0,849 0,851 0,850 0,850 0,850 0,849 Tue 0,8700,879 0,880 0,881 0,879 0,879 0,872 Wed 0,863 0,869 0,873 0,875 0,873 0,872 0,866 Thu 0,865 0,870 0,875 0,877 0,875 0,874 0,868 Fri 0,854 0,860 0,864 0,865 0,864 0,863 0,857 Sat 0,857 0,862 0,865 0,865 0,865 0,865 0,859 Sun 0,847 0,847 0,850 0,849 0,849 0,849 0,848

Hours of the day

3 4 2 5 1 6 7 8 9241011121314151623171822192021

Dissimilarity (%) 0 5 10 15 20 25

Electricity, City 2 - Tuesday

-1 0 1 2 3 4 5 6 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

154 248 388 241 605 732 72 305 285 600 588 727 99 96 435 618 862 254 105 313 685 83 126 139



0 5 10 15 20 25

kWh 0 0.5 1 1.5 2 2.5 3

3.5 Weight Plot

data1 data2 data3 data4 data5 data6 data7 data8


consumos de electricidad y de gas, o incluso ver donde y cómo se podrían empezar a instalar redes inteligentes.





CHAPTER  2:  STATE  OF  THE  ART  ...  7  

1.   Clustering  techniques  in  data  mining:  Evolution  and  successful  applications  ...  7  

2.   Load  Forecasting  and  Load  Profiling  ...  8  

a.   Load  Forecasting  ...  8  

b.   Load  Profiling  ...  8  

3.   Clustering  techniques  in  the  electricity  sector  ...  9  

a.   General  Uses  ...  9  

b.   Residential  Sector  ...  10  

4.   Smart  Grids  ...  10  

5.   Neural  networks  and  Self-­‐Organised  Maps  ...  11  


1.   Motivation  ...  13  

2.   Objectives  ...  13  


1.   Methodology  ...  15  

2.   Resources:  EIA  Database  ...  15  

3.   Tools:  MATLAB  ...  16  


1.   Hierarchical  Clustering  ...  17  

2.   Self  Organized  Maps  ...  19  

CHAPTER  6:  CASE  STUDY  1  -­‐  Illinois  ...  25  

1.   State  Overview  ...  25  

2.   Available  Data  ...  26  

3.   Hierarchical  Clustering  ...  27  

a.   Inter-­‐city  Analysis  ...  27  

i.   Electricity  ...  27  

ii.   Gas  ...  28  

b.   Intra-­‐city  Analysis  ...  29  

i.   Electricity  ...  30  

ii.   Gas  ...  32  

iii.   Comments  and  observations  ...  33  

4.   Self  –  Organized  Maps  ...  34  

a.   Electricity  ...  35  

b.   Gas  ...  39  

c.   Comments  and  observations  ...  43  

5.   Conclusions  ...  43  

CHAPTER  7:  CASE  STUDY  2  –Indiana  ...  45  

1.   State  Overview  ...  45  

2.   Available  Data  ...  46  

3.   Hierarchical  Clustering  ...  47  

a.   Inter-­‐City  Analysis  ...  47  

i.   Electricity  ...  47  

ii.   Gas  ...  48  

iii.   Comments  and  observations  ...  49  


i.   Electricity  ...  50   ii.   Gas  ...  52   iii.   Comments  and  observations  ...  54   4.   Self-­‐Organized  Maps  ...  55   a.   Electricity  ...  56   b.   Gas  ...  59   c.   Comments  and  observations  ...  62   5.   Conclusions  ...  63   CHAPTER  8:  CASE  STUDY  3  –  FLORIDA  ...  65   1.   State  Overview  ...  65   2.   Available  Data  ...  66   3.   Hierarchical  Clustering  ...  67   a.   Inter-­‐City  Analysis  ...  68   i.   Electricity  ...  68   ii.   Comments  and  observations  ...  70   b.   Intra-­‐City  Analysis  ...  70   ii.   Comments  and  observations  ...  72   4.   Self-­‐Organized  Maps  ...  72   a.   Electricity  ...  73   b.   Comments  and  observations  ...  82   5.   Conclusions  ...  82   CHAPTER  9:  CASE  STUDY  4  –  California  ...  85   1.   State  Overview  ...  85   2.   Available  Data  ...  86   3.   Hierarchical  Clustering  ...  87   a.   Inter-­‐City  Analysis  ...  88   i.   Electricity  ...  88   ii.   Gas  ...  92   iii.   Comments  and  observations  ...  95   b.   Intra-­‐City  Analysis  ...  95   i.   Electricity  ...  96   ii.   Gas  ...  103   iii.   Comments  and  observations  ...  106   4.   Self-­‐Organized  Maps  ...  107   a.   Electricity  ...  108   b.   Gas  ...  124   c.   Comments  and  observations  ...  129   5.   Conclusions  ...  129   CHAPTER  10:  CONCLUSIONS  ...  131   1.   States  Overview  –  Comparison  ...  131   2.   Available  Data  –  Comparison  ...  131   3.   Hierarchical  Clustering  –  Comparison  ...  131   a.   Inter-­‐City  Analysis  ...  131   i.   Electricity  ...  131   ii.   Gas  ...  132   b.   Intra-­‐City  Analysis  ...  132   i.   Electricity  ...  132  


c.   Comments  and  observations  ...  133   4.   Self-­‐Organized  Maps  -­‐  Comparison  ...  134   a.   Electricity  ...  134   b.   Gas  ...  136   c.   Comments  and  observations  ...  136   5.   Final  Conclusions  ...  137   REFERENCES  ...  139   BIBLIOGRAPHY  ...  140  



Improving   energy   efficiency   and   management   is   one   of   the   most   important   challenges   the   electrical   sector   faces   and   is   trying   to   tackle.   Thus   knowing   how   residential  customers  use  electrical  energy  is  fundamental  when  it  comes  to  make   decisions   in   areas   such   as   market   strategy,   satisfaction   of   demand   or   sizing   grids’   capacities  amongst  others.  


Optimizing   the   management   of   electricity   supply   and   demand   is   a   complex   matter  but  turns  out  to  be  indispensable  to  efficiently  operate  any  network  or  grid   and  consequently  achieve  a  more  stable  and  sustainable  power  transmission  system.  


Carrying  out  a  good  data  analysis  is  a  key  aspect  when  it  comes  to  searching  for   solutions  to  improve  and  facilitate  the  daily  operations  of  the  electrical  grid.    


This   is   even   more   important   in   the   context   of   deregulated   energy   markets,   where  both  consumers  and  suppliers  can  freely  intervene.  As  they  are  able  to  keep   track  of  their  energy  consumption,  consumers  can  decide  to  take  part  in  the  retail   market   while,   based   on   the   same   information,   suppliers   can   develop   electricity   tariffs  or  make  decisions  in  regard  with  load  management.  


Bearing  this  background  in  mind,  this  project  aims  at  applying  clustering  analysis   on   energy-­‐related   sets   of   data   in   order   to   achieve   a   coherent   electricity   and   gas   consumption  profiling.  


Specifically,  periodical  load  data  from  the  residential  sector  of  several  states  the   U.S.A   are   used   as   a   case   study.   The   analysis   of   such   database   will   try   to   establish   weekly   electrical   energy   and   gas   consumption   profiles   and   the   influence   of   gas   consumption  in  the  process,  individually,  for  each  of  the  selected  states.  


The  identification  of  such  energy  use  pattern  profiles  enables  the  assessment  of   new   efficient   solutions   and   also   determine   which   type   of   consumers   are   actually   making  an  efficient  use  of  the  electrical  energy.  


Therefore,  profiling  results  and  ideas  to  develop  new  decision-­‐making  processes   would  be  reported  to  power  grid  operators.  This  may  allow  the  implementation  of   new   management   policies   that   could   consequently   improve   management   and   sustainability  of  the  network.  


Likewise,  instructions  leading  to  sensible  use  of  energy  at  a  reasonable  cost  are   important   for   electric   power   companies   and   society   as   well   as   for   individual   customers.  



1. Clustering  techniques  in  data  mining:  Evolution  and  successful  applications    

Clustering  defines  the  task  of  creating  groups  of  abstract  objects  into  meaningful   sub-­‐classes  of  similar  objects.  Once  identified  into  these  sub-­‐classes,  also  known  as   clusters,   all   the   data   objects   that   belong   to   the   same   cluster   can   eventually   be   treated   the   same.   This   technique   plays   an   important   role   when   it   comes   to   data   mining,   which   turns   out   to   be   necessary   when   trying   to   find   comprehensible   structures  in  a  large  amount  of  data  sets.  Unlike  classification  processes,  the  classes   are  to  be  found  and  were  not  pre-­‐defined.  Fig.1.1  generically  illustrates  the  process:    


Figure 1.1

High   quality   clusters   are   those   in   which   the   intra-­‐cluster   similarity   is   high   while  the  inter-­‐class  similarity  is  low,  as  shown  in  Fig.  1.2:  


Figure 1.2

A  successful  application  of  clustering  (that  is,  high  quality)  will  provide  a  good   basis  for  further  data  evaluation  interpretation,  or  structure.  More  precisely,  it  also   allows  discovering  patterns  with  regard  to  large  data  sets,  which,  is  what  data  mining   consist   of.   Data   mining   is   the   key   to   extracting   as   much   relevant   information   as   possible  from  the  data  sets.    


The  first  known  use  of  clustering  is  believed  to  have  taken  place  in  London  in   1854.  As  a  result,  John  Snow,  English  physician,  managed  to  put  an  end  to  a  cholera   outbreak.  He  basically  plotted  reported  cases  of  the  disease  on  a  special  map  and   found  a  link  between  disease  density  and  the  particular  location  of  a  well  pump  in   the  city.  Once  the  pump  was  removed,  the  epidemic  came  to  an  end.  



However,   even   though   some   applications   of   clustering   and   data   mining   in   the   electricity  sector  have  been  developed  and  new  are  being  studied,  there  is  still  much   to  investigate.  This  is  especially  the  case  in  what  regards  finding  solutions  to  improve   the  operation  of  the  complex  networks  that  this  sector  encloses  in  the  presence  of  a   deregulated  market.  

2. Load  Forecasting  and  Load  Profiling    

a. Load  Forecasting  


The   decision   making-­‐process   in   the   electricity   sector   and   network   encloses   several  levels  that  must  be  taken  into  account.  From  day  to  day  operation  of  power   plants   to   the   planning   of   facilities,   decisions   must   be   taken   accordingly   to   daily   circumstances  of  the  demand  curve:  This  is  a  complex  task,  mainly  because  of  the   deregulation  of  energy  markets.  


Therefore  using  artificial  intelligence  methods  to  get  to  a  coherent  approach  and   model  to  manage  the  power  grid  is  fundamental.  


Over  the  past  years,  the  prediction  of  future  load  demands,  also  known  as  load   forecasting,   has   established   itself   as   a   core   process   when   it   comes   to   address   the   daily  tasks  and  challenges  involved  in  the  operation  of  the  electrical  system.  Electric   utilities  or  energy  providers  use  it  to  predict  the  amount  of  power  needed  at  every   moment  to  meet  the  demand  and  supply  equilibrium  [1].  


However,   even   though   various   methods   have   been   studied   and   developed,   a   global  approach  has  not  yet  been  found.  Due  to  the  variability  of  demand  patterns,   addressing  smaller  groups  and  individual  case  studies  is  usually  applied.  


b. Load  Profiling  

A   load   profile   is   a   graph   that   represents   the   variation   of   the   electrical   load   versus  time.  They  vary  depending  on  the  sector  or  customer  they  characterize  and   also  with  other  external  factors  such  as  climate  or  seasons  of  the  year.    


Many  types  of  profiles  can  be  searched  for  depending  upon  criteria  of  profiling,   types  of  application,  innovative  ways  to  operate  the  grid,  etc.  

Many   researches   have   contemplated   the   determination   of   daily   load   profiles   to   characterize  the  electricity  usage  of  households.    


Fig.1.3   illustrates   an   example   of   load   profiling   of   a   typical   weekday   in   a   3-­‐ bedroom  household  in  the  Brig  o’Doon  community  (Scotland,  UK)  [2]  


  Figure 1.3


There   is   not   a   unique   method   to   tackle   load   profiling.   However   hierarchical   clustering  is  usually  preferred  to  achieve  the  grouping  of  similar  households  together.   That   results   in   a   reduced   number   of   archetypal   daily   load   profiles   that   enclose   several   households   and   which   are   later   on   used   as   a   target   for   intervention   on   consumers’  behaviour.    


3. Clustering  techniques  in  the  electricity  sector  


a. General  Uses  

As  there  are  different  types  of  clustering  techniques,  several  of  them  have  been   assessed  for  different  case  studies,  specifically  in  the  electrical  sector.  Each  of  these   techniques  involves  different  mathematical  algorithms.  Therefore,  size  or  data  type   greatly  influence  the  election  of  one  technique  or  another  as  they  all  have  different   characteristics.  


Hierarchical  clustering,  as  stated  before,  has  many  times  been  used  to  achieve   load  profiling  and  can  also  be  used  in  combination  with  other  algorithms.  


In   that   direction,   the   collaboration   between   power   engineers   and   mathematicians   from   several   European   countries   resulted   in   a   study   named   “Hierarchical   Spectral   Clustering   of   Power   Grids”   [3]   that   aims   at   solving   practical   power   engineering   problems   and   model   substructures   of   a   whole   transmission   network.

However,  several  other  techniques  apply  for  case  studies  in  the  electrical  sector.   K-­‐means  clustering  algorithm  is  a  partitioning  technique  (as  opposed  to  hierarchical   clustering  it  will  result  in  a  unique  separation  of  the  data  set  instead  of  a  hierarchy   where  adequate  clusters  must  later  on  be  selected).  Recently,  it  was  proven  to  be   very  useful  for  data  mining  into  the  power  system  in  a  case  study  of  a  student  of  the   North  China’s  Electric  Power  University  (Beijing)  [4].  The  results  of  this  analysis  show   that   in   a   context   where   power   data   is   generated   every   second,   the   algorithm   was   very  efficient  to  analyse  the  power  load  of  customers.  


Last  but  not  least,  Fuzzy  C-­‐means  is  a  well-­‐known  clustering  algorithm  when  it   comes  to  load  profile  determination  in  a  deregulated  environment.  Several  studies  in   the   UK   (University   of   Strathclyde)   [5],   in   Malaysia   [6][7]   or   Bangladesh   [8]   have   studied  this  method’s  validity  in  a  deregulated  environment.    

b. Residential  Sector  

In  the  UK,  several  studies  on  the  applications  of  clustering  in  electricity  use  load   profiling  in  the  residential  sector  must  be  highlighted.    


For  instance,  students  of  the  University  of  Nottingham  found  out  that  describing   the  changes  in  consumers’  behaviours  by  means  of  patterns  obtained  with  clustering   algorithms  leads  to  more  consistent  groupings  of  households.  This  was  reported  in   their  paper:  “Variability  of  Behaviour  in  Electricity  Load  Profile  Clustering;  Who  Does   Things  at  the  Same  Time  Each  Day?”[9].  


Furthermore,   these   students   are   now   researching   how   to   best   define   representative   load   profiles   for   domestic   electricity   users   in   the   UK   and   have   published   the   first   results   of   the   study   in   their   article:   “The   application   of   a   data   mining   framework   to   energy   usage   profiling   in   domestic   residences   using   UK   data”[10].  


In   the   past   few   years,   new   methods   involving   clustering   techniques   have   been   implemented  and  tested  in  the  residential  sector  of  other  countries.    


One   particular   example   is   “A   New   Proposal   of   Typification   of   Load   Profiles   to   Support  the  Decision-­‐Making  in  the  Sector  of  Electric  Energy  Distribution”  from  the   Polytechnic   school   of   the   Federal   University   of   Bahia   (Brazil)   [11].   In   this   work,   by   combining  the  selection,  classification  and  clustering  of  load  curves,  crucial  features   of   a   load   curve   of   residential   users   that   also   considered   seasonal   and   temporal   aspects  was  obtained.  


The   implementation   of   this   method   by   a   Brazilian   Electric   Company   in   the   context  of  an  energy  efficiency  program  was  a  success,  as,  for  instance,  the  typing   method  allowed  to  analyse  the  impact  of  changing  refrigerators  in  some  low-­‐income   cities  in  the  state  of  Maranhão.  


A  further  analysis  on  the  study  proved  that  this  new  method  actually  resulted  in   a  greater  diversity  of  patterns  than  more  traditional  methods  such  as  Fuzzy  C-­‐Means.  

4. Smart  Grids    

Results  of  load  profiling  in  the  residential  sector  may  later  on  contribute  to  the   implementation  of  smart  grids.  Smart  grids  are  technologically  advanced  grids  that,   by   means   of   additional   monitoring,   control   and   communication   activities,   enable   moving   electricity   around   the   system   in   a   more   economical   and   efficient   way.   Basically,   they   maximize   the   throughput   of   the   system   and   reduce   the   energy   consumption  at  the  same  time.  



In   this   context,   the   USA   has   established   support   for   the   smart   grid   as   federal   policy,  with  the  Energy  Independence  and  Security  Act  of  2007.  


5.  Neural  networks  and  Self-­‐Organised  Maps  


More  recently,  other  thechniques  related  to  a  different  kind  of  clustering  were  also   studied  to  find  if  they  could  be  useful  for  decision-­‐making  processes  in  the  electrical   sector.    


Amongst   these,   neural   networks   have   shown   interesting   results   in   what   regards  power  grid  simulations,  which  is  reflected  in  the  study  “The  Application  of   Neural  Networks  to  Electric  Power  grid  Simulation”  [12],  while  Self-­‐Organizing  Maps   techniques   were   assessed   in   the   paper   “Electricity   Load   Forecasting   using   Self   Organising  Maps”[13].  



1. Motivation    

This   project’s   motivation   is   to   verify   if   obtaining   weekly   electrical   use   consumption   pattern   profiles   by   means   of   hierarchical   clustering   is   viable   and,   if,   later   on,   may   supply   sufficient   and   useful   information   to   improve   the   electrical   network’s  operation  and  management.  


This  implies  applying  hierarchical  clustering  to  databases  that  cover  multiple  big-­‐ scale  subjects  of  study.  


Thus   the   extent   of   the   study   is   considerable,   as   it   targets,   individually,   whole   states  within  which  inter  and  intra  city  analysis  are  both  contemplated.  


Results  of  this  analysis  may,  later  on,  contribute  to  the  upgrade  of  the  grid  to  a   smart   grid,   which   is   actually   regarded   as   an   important   advance   to   the   future   operation  of  any  electrical  system.  


2. Objectives  


The  first  objective  of  this  analysis  is  to  apply  hierarchical  clustering  techniques  in   order   to   find   possible   profiles   of   use   of   electrical   energy   by   different   types   of   customers   within   the   residential   sector   of   the   U.S.A.  Clustering   will   be   individually   applied   to   databases   of   several   states,   provided   by   the   Energy   information   Administration   of   the   U.S.A.  [14],   previously   selected.   For   each   state,   data   of   available   cities   will   first   be   studied   for   each   city   (intra-­‐city)   and   then   common   patterns  amongst  cities  will  be  looked  for  (inter-­‐city).  


Then,  the   identification   of   a   reduced   number   of   typical   weekly   profiles   that   explain   the   consuming   behaviour   of   customers   that   characterize   a   state   will   be   searched  for.  


Also,   the   influence   and   relation   between   gas   and   electricity   consumption   will   also  be  studied  with  clustering  techniques  and  information  research.  


Finally,   the   ultimate   objective   of   the   project   is   to   come   up   with   innovative   decision-­‐making   processes   based   on   the   different   types   of   profiles   previously   established.  This  may  allow  the  establishment  of  policies  that  once  implemented  by   giving   instructions   to   customers   would   improve   and   facilitate   a   more   optimal   operation  of  the  system.  



1. Methodology  


A. Preliminary  Study.  

This  stage  contemplates  the  selection  of  the  states  that  the  case  study  will  focus   on.  Once  they  have  been  chosen,  general  information  on  each  of  them  is  searched   for  in  order  to  establish  some  kind  of  background  and  introduce  their  energy  profile.   This  information  may  be  energy-­‐related  (e.g.:  Importer/Exporter,  dominant  primary   energy   resources,   de/regulated   energy   market   or   not)   or   non-­‐energy   related   (e.g.:   Location  in  the  U.S.A,  Climate)  


B. Selecting  and  organizing  data  sets  

The  selecting  part  consists  in  selecting  the  adequate  sets  of  data  from  the  whole   energy   use   database   provided   by   the   U.S   Energy   Information   Administration   (EIA).   The  organizing  part  involves  putting  all  data  sets  in  the  correct  matrix  format  that   the  further  application  of  clustering  algorithms  requires.  


C. Compare  and  study  intra-­‐city  consumption:  Electricity/Gas  

This  consists  in  the  deduction  of  energy  profiles  for  each  city  by  trying  to  group   days.   Basically   clustering   will   be   individually   applied   to   each   city   and   establish   a   hierarchy  of  similarity  amongst  the  twenty-­‐four  hours  of  each  day  of  the  week.    

D. Compare  and  study  inter-­‐city  consumption:  Electricity/Gas  

This  consist  in  the  deduction  of  energy  profiles  within  the  whole  state  by  trying   to  group  individual  cities’  profiles  previously  established.  


E. Complete  time  logical  models  

Once   point   3   and   4   have   been   completed,   all   relevant   information   will   be   arranged  in  archetypal  profiles  (e.g.  City  C1,  hours  1-­‐6  leads  to  Consumption  type  A)    

F. Conclusions  and  decision-­‐making  process  for  clients  

This  part  will  analyse  which  archetypal  profiles  efficiently  use  energy  and  which   may  have  to  improve.  It  will  also  consider  what  kind  of  instructions  the  operator  of   the  system  could  give  to  customers  in  order  to  gain  efficiency.  


2. Resources:  EIA  Database    

Sets  of  data  used  for  the  case  study  are  the  ones  published  by  the  U.S  Energy   Information  Administration  (EIA)  [14].  These  sets  of  data  are  presented  as  “Column   Separated-­‐Value”  (csv)  files.  The  database  contains  csv  files  for  variable  numbers  of   cities   of   all   American   states.   Each   csv   file   contains   hourly   energy   consumption   information  for  a  whole  year  (8760  hours).  There  are  several  excel  files  for  each  city,   depending  upon  the  type  of  energy-­‐related  information  (Sector,  Type  of  Energy,  etc).   In   this   case,  the   residential   Data   Load   files   for   electricity   and   gas   were   the   useful   ones  for  the  case  study.  It  must  also  be  pointed  out  that  each  data  file  contains  the   information  of  a  typical  residential  house  of  the  city  and  not  the  whole  city  itself.  



Fig.  1.4  below  illustrates  the  csv  file  once  turned  into  a  standard  matrix  file.  The   data  sample  corresponds  to  the  city  of  Aurora  (Illinois).  


  Figure 1.4

Columns  are  described  here  below:   A. Date  

B. Electricity  Facility  Total  (kWh)   C. Gas  Facility  Total  (kWh)   D. Heating  Electricity  (kWh)   E. Heating  Gas  (kWh)   F. Cooling  Electricirt  

G. HVAC  Fan  +  Fans  Electricity  (kWh)   H. Electricity  HVAC  (kWh)  

I. Fans  Electricity  (kWh)  

J. General  Interior  Lights  Electricity  (kWh)   K. General  Exterior  Lights  Electricity  (kWh)   L. Apl  Interiour  Equipment  Electricity  (kWh)  

M. Miscellanous  Interiour  Equipment  Electricity  (kWh)   N. Water  Heater  Water  Systems  (kWh)  


Columns  A,  B  and  C  were  the  ones  useful  for  the  case  studies.    

3. Tools:  MATLAB    

The  main  tool  used  to  carry  out  the  project  is  MATLAB,  a  high-­‐level  language  and   interactive  environment  for  numerical  computation,  visualization,  and  programming.  

This   software   basically   allows   processing   and   analysing   data   sets   by   means   of   mathematical  tools,  algorithms  and  developed  codes.  




Several  mathematical  techniques  have  been  used  to  apply  clustering  and  analyse   data  sets.    


1. Hierarchical  Clustering    

Hierarchical  clustering  groups  data  into  multilevel  hierarchy  of  clusters  by  means   of   a   cluster   tree,   also   called   dendogram.   There   are   therefore   several   scales   of   clustering  as  clusters  are  joined  with  other  clusters  at  the  next  level.  


Here  below,  a  brief  example  explains  what  is  a  dendrogram  and  how  to  interpret   it  (Fig.  5.1).  


  Figure 5.1

• Bottom  clusters  are  the  most  similar  and  dissimilarity  increases  while  going  up  the  Y-­‐

axis  of  the  graph  

• Objects  grouped  in  pairs  at  the  lowest  level  then  in  pairs  of  clusters  of  the  previous  


• The   X   axis   describes   the   indices   associated   to   each   of   the   studied   objects   (for  

instance  cities)  

• The  Y  axis  represents  relative  dissimilarity  

• Here   a   maximum   dissimilarity   of   55%   is   considered,   there   would   be   four   clusters  

(each  associated  to  a  different  colour  in  the  graph.)    

Now   applying   this   to   the   sets   of   data   of   the   case   studies   of   the   project,   the   electricity   and   gas   sets   of   data   for   each   city   are   8760x1   matrices   with   the   corresponding  energy  consumption  for  each  hour  of  the  year.  


The   hierarchical   clustering   algorithm   basically   follows   the   following   steps   in   Matlab:    



i. Compute  the  Euclidean  distance  between  objects  of  the  Energy  Data  Matrix   into  a  dissimilarity  matrix  

To  compute  the  Euclidean  distance  between  pairs  of  objects  in  m-­‐by-­‐n  data   matrix   X   the   function  pdist   is   used.   Rows   of  X  correspond   to   observations,   and  columns  correspond  to  variables.  D  is  a  row  vector  of  length  m(m–1)/2,   corresponding  to  pairs  of  observations  in  X.  The  distances  are  arranged  in  the   order  (2,1),  (3,1),  ...,  (m,1),  (3,2),  ...,  (m,2),  ...,  (m,m–1))  in  D,  with  D  being  the   dissimilarity  matrix.  


ii. Group  the  objects  into  a  binary,  hierarchical  cluster  tree.  

Then,  the  function  linkage  is  used  to  obtain  a  matrix  that  encodes  a  tree  of   hierarchical  clusters  of  the  dissimilarity  matrix.    

Basically,   it   group   objects   into   the   relevant   clusters   once   the   proximity   between   all   of   them   has   been   established.   Thanks   to   the   distance   information  obtained  from  the  previous  step  (computed  in  the  dissimilarity   matrix),  the  linkage  function  links  pairs  of  objects  that  are  close  together  into   binary   clusters.   Then,  these   newly   formed   clusters   are   linked   to   each   other   again  and  again  to  create  bigger  clusters  until  all  the  objects  in  the  original   data  set  are  linked  together  in  a  hierarchical  tree  matrix.    

In   this   case   the   single   linkage   also   known   as   nearest   neighbor   method   is   applied.  This  method  uses  the  smallest  distance  between  objects  in  the  two   clusters  to  form  the  next  cluster.    



iii. Plot  the  dendogram  that  illustrates  the  hierarchical  tree  matrix  

The   hierarchical,   binary   cluster   tree   created   by   the  linkage  function   is   most   easily  understood  when  viewed  graphically.  To  do  so,  the  dendogram  plot  is   generated  from  the  hierarchical  tree  matrix  with  the  dendogram  function.    

iv. Check  Dissimilarity  

Finally,   checking   how   well   the   obtained   dendogram   graphically   reflects   the   information  obtained  in  the  matrices  of  the  algorithm  needs  to  be  done.  The   heights   of   the   hierarchical   tree   between   objects   are   called   cophenetic   distances.    

One  way  to  see  how  well  the  cluster  tree  generated  by  the  linkage  function   reflects   the   original   data   is   to   compare   the   cophenetic   distances   with   the   original  distance  data  generated  by  the  pdist  function.  To  do  so,  the  cophenet  

function  is  used  and  returns  the  cophenetic  correlation  coefficient  between   the  dendogram  and  the  hierarchical  tree  matrix.  The  closer  the  value  of  the   cophenetic   correlation   coefficient   is   to   1,   the   more   accurate   the   clustering   solution.  


The  implementation  of  the  algorithm  in  Matlab  is  done  with  the  following  code:  

Figure 5.2


2. Self  Organized  Maps  


The   Self-­‐Organized   Map   (SOM)   is   one   of   the   best-­‐known   self-­‐organized   neural   network  models.  


Self-­‐Organized   learning   does   not   require   supervision   or   the   user’s   knowledge/intervention   during   the   process.   It   consists   in   repetitively   modifying   synaptic  weights  of  the  network  as  a  response  to  activation  models  and  according  to   predetermined   rules   until   reaching   a   final   weight   configuration.   This   final   configuration   will   be   stable   face   to   any   kind   of   stimulus.   Therefore,   at   this   point,   behavior  patterns  have  been  acquired.  


This  idea  is  based  on  Turing’s  observation  (1952):  ”global  order  can  be  achieve   through  local  interactions”  

%X matrix: Information of the 10 objects in columns

X = [Co1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10];

%Transposition of the matrix to apply the function; X=X'

%Function that calculates Euclidean Distances Y = pdist(X,'euclidean');

%Column vector that classifies distances from smallest to biggest V=sort(Y(:));

%First value = minimum distance min=V(1);

%Dimension of the distance matrix s=size(Y);

%Last value of the ordered distance matrix max=V(s(2));

%Correction of the Y matrix before plotting the dendrogram Y=((Y-min).*100)/(max-min);

%Linkage and generation of the dendrogram; Z = linkage(Y);

[H,T] = dendrogram(Z,'colorthreshold','default'); set(H,'LineWidth',2);

%Checking Dissimilarity with cophenetic coefficient; cophenetic = cophenet(Z,Y)


Self-­‐organization  can  be  applied  to  the  way  the  human  brain  works  as  well  as  to   artificial   neural   networks.   A   network   organizes   itself   at   two   different,   interacting   levels:    

• Activity:  Responses  resulting  after  applying  particular  stimuli.  

• Connectivity:  Connection  forces  of  the  network  will  change  depending  on  the   type  of  model  that  causes  its  activity.  


Neurobiological   studies   indicate   that   different   sensory   inputs   (motor,   visual,   auditory,   etc.)   are   mapped   onto   corresponding   areas   of   the   cerebral   cortex   in   an   orderly  fashion.  

Therefore,   neural   network   models   basically   result   in   self-­‐organized   networks   that   would  behave  in  a  neurologically  inspired  manner.  


SOM  can  be  used  to  cluster  data  without  knowing  much  about  the  input  data  or   to  detect  patterns,  which  is  useful  for  the  purpose  of  this  project.  


In   such   case,   all   neurons   then   organize   themselves   into   a   network   that   is   represented  in  form  of  a  map:  the  SOM.  Such  map  has  two  important  properties  

• At   each   stage   of   representation,   or   processing,   each   piece   of   incoming   information  is  kept  in  its  proper  context/neighborhood.    

• Neurons   dealing   with   closely   related   pieces   of   information   are   neighbor   neurons  in  the  map  


Regarding  what  neurons  represent,  it  can  be  said  that  an  output  neuron  of  the   map  corresponds  to  a  particular  feature  drawn  from  the  input  space.  


The   following   example   illustrates   a   simple   application   of   how   self-­‐organized   maps  work  (Fig  5.3)  [15]    



Suppose we have four data points (crosses) in our continuous 2D input space, and want to map this

onto four points in a discrete 1D output space. The output nodes map to points in the input space (circles). Random initial weights

start the circles at random positions in the centre of the input



We randomly pick one of the data points for training (cross in circle). The closest output point represents

the winning neuron (solid diamond). That winning neuron is moved towards the data point by a

certain amount, and the two neighbouring neurons move by

smaller amounts (arrows).  


Next we randomly pick another data point for training (cross in circle). The closest output point gives the new winning neuron (solid diamond). The winning neuron moves towards the data point by a certain amount, and the

one neighbouring neuron moves by a smaller amount (arrows).


We carry on randomly picking data points for training (cross in

circle). Each winning neuron moves towards the data point by a

certain amount, and its neighbouring neuron(s) move by

smaller amounts (arrows). Eventually the whole output grid

unravels itself to represent the input space.

Figure 5.3  


The  following  code  and  figure  (Figs.  5.4  and  5.5)  illustrate  how  this  technique  can  be   implemented  in  Matlab:  

Figure 5.4


% Load Data and determination of input matrix

load('Cities.mat'); input=EnergyData';

% Dimensions of the map

dimension1 = 6; % dimension X

dimension2 = 4; % dimension Y

% Structure that facilitates plotting


sm.topol.msize=[dimension2 dimension1];

% Definition of map parameter

iteraciones = 2000; radio_v=3;


net = selforgmap([dimension1 dimension2], iteraciones,radio_v, top);

%Learning factor = 0.6;

%Training the network

[net,tr] = train(net,input);

%Weight definition, axis and plot

pesos= net.IW{1};


figure(10); plot(t, pesos(:,1), t, pesos(:,2), t, pesos(:,3), ...

t, pesos(:,4), t, pesos(:,5), t, pesos(:,6), t, pesos(:,7), ...

t, pesos(:,8), t, pesos(:,9), t, pesos(:,10), ...

t, pesos(:,11), t, pesos(:,12), t, pesos(:,13), ...

t, pesos(:,14), t, pesos(:,15), t, pesos(:,16), ...

t, pesos(:,17), t, pesos(:,18), t, pesos(:,19))

%Reordering weight positions for plotting purposes

pesos1=pesos; m=0;

for j=1:dimension2

for i=1:dimension1

ff = pesos(i*dimension2 - m,:,:); pesos1(i+dimension1*m, :, :) = ff;




% Actualization of reordered weights

sm.codebook= pesos1;

% Plotting functions

figure(6) T=pesos1;

som_barplane(sm, pesos, '', 'unitwise');

title('SOM\_bars') figure(7)

som_pieplane(sm, pesos1); title('SOM\_pie')


som_plotplane(sm, pesos1, 'b');



  Figure 5.5


CHAPTER  6:  CASE  STUDY  1  -­‐  Illinois    

1. State  Overview  



Illinois  is  a  Mid-­‐Western  American  State  that  is  also  considered  a  microcosm  of   the  U.S.A.  It  is  the  fifth  most  populated  State  of  the  U.S.A  and  also  ranks  fifth  in  what   regards  Gross  domestic  product.  Thanks  to  its  location,  access  to  major  waterways,   rail  and  aviation  spotlights.  Illinois  is  a  major  transportation  hub.  This  is  key  when  it   comes  to  the  transportation  of  crude  oil  and  natural  gas  throughout  North  America.   Thus  this  State  highly  contributes  to  the  Nation’s  economy.  


Regarding  Energy-­‐related  aspects,  Illinois  is  an  important  energy  consumer  even   though  its  per  capita  energy  consumption  is  slightly  below  the  national  average.  Its   electricity  and  gas  markets  are  deregulated  which  makes  it  a  suitable  state  for  this   project.  


In   terms   of   electricity   generation,   Illinois   is   the   leader   of   nuclear   power   generation  in  the  U.S.A  with  around  one  eight  of  its  total  generation.  It  represents   half  of  Illinois’  total  generation  while  the  rest  mostly  comes  from  coal-­‐fired  power   plants.   Illinois   is   a   State   that   generates   considerably   more   electricity   than   it   consumes   and   that   is   served   by   two   different   grids.   The   first   one   encloses   the   Northern  part  of  the  State  and  interconnects  with  the  Mid-­‐Atlantic  States  while  the   second  one  encloses  the  Southern  part  of  the  State  and  interconnects  with  the  Mid-­‐ continent  states.    


In   what   regards   natural   gas,   Illinois   has   few   producing   wells   and   a   minimal   production.  However,  it  is  a  major  crossroad  more  than  a  dozen  interstate  natural   gas   pipelines   and   two   natural   gas   market   centers.   It   also   has   the   second   largest   natural  gas  storage  capacity  of  the  U.S.A.  


The  residential  sector  is  the  most  important  consumer  of  natural  gas  as  around   four-­‐fifths  of  the  homes  in  Illinois  use  it  for  heating.    

2. Available  Data  


The   EIA   provided   information   for   19   cities   in   the   whole   state,   which   were   all   taken  into  account  for  the  case  study.  In  order  to  facilitate  the  identification  of  those   cities   when   applying   mathematical   algorithms   they   were   classified   in   alphabetical   order  and  given  a  reference  number.  


The  following  table  enumerates  the  cities  and  gives  their  location  on  the  map  of   Illinois.  

List  of  cities  and  

reference  number   Map  of  Illinois  with  numbered  cities  

1) Aurora  

2) Belleville-­‐Scott  

3) Bloomington  

4) Cahokia  

5) Carbondale  


6) Chicago-­‐Midway  

7) Chicago-­‐Ohare  

8) Decatur  

9) DuPage  



11)Moline-­‐Quad  City  

12)Mount  Vernon  

13)Peoria   14)Quincy  


16)Springfield   17)Sterling  Rock  

18)University   of  




Colour  code  according   to  location:  

Northern  Cities  

Southern  Cities    



Table 6.1


3. Hierarchical  Clustering      

The   first   objective   of   this   analysis   is   to   get   an   idea   of   behaviour   similarities   or   dissimilarities  between  all  cities  of  the  State  of  Illinois.  For  instance,  if  there  are  some   clear   groups   of   cities   that   follow   similar   behaviour   patterns,   if   they   do,   to   what   extent   and   even   if   they   are   close   together   geographically   or   not   within   the   state.   That   would   be   the   inter-­‐city   analysis,   and   its   relevant   result   is   a   dendrogram   that   contains  all  cities  of  the  state  and  groups  them  into  subclusters  according  to  their   dissimilarity  level.  


Once  this  has  been  done,  and  bearing  in  mind  the  results  of  the  inter-­‐city  analysis,   each  city  is  going  to  be  analysed  individually  with  the  intra-­‐city  analysis.  The  intra-­‐ city  analysis  results  will  consist  of  a  7x7  matrix  for  each  city  and  seven  dendrograms.   The   7x7   matrix   is   a   cophenetic   correlation   coefficient   matrix   and   there   is   one   dendrogram   for   each   seven   days   of   the   week   and   the   weekends   (i.e.:   one   for   all   Mondays,  one  for  all  Tuesdays,  etc)    


The   cophenetic   correlation   matrix   pursues   seeing   if   the   hierarchical   clustering   technique  reliably  describes  the  data  of  each  day  (diagonal  terms  with  value  above   0.8)  and  see  how  days  are  similar  between  themselves  (non-­‐diagonal  terms,  days  are   considered  to  behave  in  a  similar  way  when  value  is  above  0.8)  


The   dendrogram   of   each   day   basically   groups   the   24   hours   of   the   day   into   subclusters  according  to  their  dissimilarity  level  and  allows  analysing  and  describing   a  given  day  (i.e:  Typical  Monday  of  a  given  city).  


All   of   the   previously   described   processes   will   be   applied   to   electricity   and   gas   consumption  databases  separately.  


a. Inter-­‐city  Analysis  


Here   hierarchical   clustering   was   applied   to   all   19   cities   of   the   state   of   Illinois   (8760x19  matrix)  first  to  electricity  consumption  data  then  to  gas-­‐consumption  data.   The  following  dendrograms  were  obtained.    


The  X-­‐axis  indicates  the  reference  number  of  each  city  as  previously  established   while  the  Y-­‐axis  indicates  the  relative  dissimilarity  between  all  binary  clusters.  


i. Electricity    

In   the   following   dendrogram   (Fig.   6.1)   two   clusters   corresponding   to   the   blue   and   red   colours   of   the   figure   can   be   observed.   The   cities   of   the   red   cluster   (2,4,5,10,12)  correspond  to  the  Southern  cities  of  Illinois  while  the  Northern  cities  all   belong  to  the  blue  cluster  (the  rest).  





Descargar ahora (161 página)
Related subjects :