• No se han encontrado resultados

Five suggestions for future research are made:

1. Chapter 4 has demonstrated how an auto-encoder network can be used to find a low-dimensional representation of the state vector of an electric water

126 CONCLUSIONS AND FUTURE RESEARCH

heater. The simulation results have indicated that using such a compact representation can improve the quality of the control policy and that the optimal state representation depends on the number of observations in the batch. To this end it would be interesting to develop a method for selecting an appropriate state representation during the learning process.

A promising route is to construct experts, where each expert combines a learning algorithm and a different feature representation. A metric based on the performance of each expert, as presented in [38], could then be used to select the expert with the highest metric as described in [164].

2. Chapter 6 has presented the results of the proposed control approach applied to an electric water heater in a lab environment. This electric water heater was a standard unit equipped with eight temperature sensors along the hull of its water buffer. However, adding eight temperature sensors to an existing electric water heater could be rather costly. To reduce this cost, it would be interesting to investigate a low-cost alternative with one temperature sensor and a flow meter, that measure, respectively, temperature and flow rate of the water exiting the water buffer.

3. This dissertation has ignored the possible conflict between the market-based objective of the demand response aggregator and the technical objectives of the distribution grid operator. It should be noted that an aggregator is a market player and is assumed to have no technical information about the underlying distribution grid. In this context, it would be interesting to include a load flow simulation to Chapter 7 and to assess the impact of voltage deviations and congestions on the flexibility of the aggregator.

4. In this work, an ensemble of extremely randomized trees was used as a function approximator to estimate the Q-function. Extremely randomized trees are relatively robust to changes in their parameter setting and computationally efficient. A drawback of extremely randomized trees is that they cannot extrapolate beyond the range of their training set.

Therefore, it would be interesting to compare the performance of extremely randomized trees with other approximation architectures, such as neural networks [98].

5. In most real-world control applications a part of the state of the environment cannot be measured and remains hidden from the agent.

For example, an agent applied to an HVAC system for climate control can measure the air temperature but not the temperature of the building envelope. In this work, the temperature dynamics of the building envelope was captured by including past observations of the air temperature in the state vector. A promising alternative approach would be to capture

FUTURE RESEARCH 127

hidden state information by using a Long Short-Time Memory (or LSTM) recurrent network [97].

BIBLIOGRAPHY 129

Bibliography

[1] R. Pachauri and L. Meyer, Eds, “Climate change 2014: Synthesis report.

contribution of working groups I, II and III to the 5th assessment report of the intergovernmental panel on climate change,” Geneva, Switzerland, Tech. Rep., 2014.

[2] European Commission (EC), “A practical guide to a prosperous, low-carbon Europe,” http://www.roadmap2050.eu/attachments/files/

Volume1_fullreport_PressPack.pdf, Tech. Rep., [Online: accessed December 11, 2015].

[3] P. Pinson, H. Madsen, H. Nielsen, G. Papaefthymiou, and B. Klöckl,

“From probabilistic forecasts to statistical scenarios of short-term wind power production,” Wind Energy, vol. 12, no. 1, pp. 51–62, 2009.

[4] E. Lorenz, J. Hurka, D. Heinemann, and H. G. Beyer, “Irradiance forecasting for the power prediction of grid-connected photovoltaic systems,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 2, no. 1, pp. 2–10, 2009.

[5] C. Ferreira, J. Gama, L. Matias, A. Botterud, and J. Wang, “A survey on wind power ramp forecasting.” Argonne National Laboratory (ANL), Tech. Rep., 2011.

[6] S. Weckx and J. Driesen, “Load balancing with EV chargers and PV inverters in unbalanced distribution grids,” IEEE Trans. on Sustainable Energy, vol. 6, no. 2, pp. 635–643, 2015.

[7] C. Gonzalez, J. Geuns, S. Weckx, T. Wijnhoven, P. Vingerhoets, T. De Rybel, and J. Driesen, “LV distribution network feeders in Belgium and power quality issues due to increasing PV penetration levels,” in Proc.

3rd IEEE PES Innov. Smart Grid Technol. Conf. (ISGT Europe), 2012, pp. 1–8.

[8] K. Dyke, N. Schofield, and M. Barnes, “The impact of transport electrification on electrical networks,” IEEE Trans. on Industrial Electronics, vol. 57, no. 12, pp. 3917–3926, Dec. 2010.

[9] Eurelectric, “A Eurelectric policy paper: Electrification of heating and cooling,” Tech. Rep. [Online]. Available: http://www.eurelectric.org/

[10] P. Pinson, H. Madsen et al., “Benefits and challenges of electrical demand response: A critical review,” Renewable and Sustainable Energy Reviews, vol. 39, pp. 686–699, 2014.

130 BIBLIOGRAPHY

[11] G. Strbac, “Demand side management: Benefits and challenges,” Energy policy, vol. 36, no. 12, pp. 4419–4426, 2008.

[12] Department of Energy, “Benefits of demand response in electricity markets and recommendations for achieving them,” Department of Energy Report to the US Congress, Tech. Rep., 2006. [Online]. Available:

http://www.oe.energy.gov/DocumentsandMedia/congress_1252d.pdf.

[13] M. Albadi and E. El-Saadany, “Demand response in electricity markets:

An overview,” in IEEE Proc. Power Engineering Society General Meeting, June 2007, pp. 1–5.

[14] Federal Energy Regulatory Commission, “Assessment of demand response and advanced metering,” Tech. Rep., 2006.

[15] E. Peeters, D. Six, M. Hommelberg, R. Belhomme, and F. Bouffard,

“The ADDRESS project: An architecture and markets to enable active demand,” in Proc. 6th IEEE Int. Conf. on the European Energy Market (EEM), Leuven, Belgium, May 2009, pp. 1–5.

[16] R. Bessa and M. Matos, “Global against divided optimization for the participation of an EV aggregator in the day-ahead electricity market.

part I: Theory,” Electric Power Systems Research, vol. 95, pp. 309–318, 2013.

[17] S. Weckx, R. D’Hulst, and J. Driesen, “Primary and secondary frequency support by a multi-agent demand control system,” IEEE Trans. on Power Systems, vol. 30, no. 3, pp. 1394–1404, May 2015.

[18] P. Siano, “Demand response and smart grids—a survey,” Renewable and Sustainable Energy Reviews, vol. 30, pp. 461–478, 2014.

[19] R. Belmans, J. Driesen, G. Deconinck, G. Vekemans, G. Schaeffer, E. Peeters, P. De Meester, J. Poortmans, D. Vanderzande, J. Declercq et al., “Naar wereldleiderschap in hernieuwbare energietechnologie en elektrische infrastructuur,” Strategische voorstellen aan de Vlaamse regering, 2009.

[20] W. Labeeuw, “Characterisation and modelling of residential electricity demand,” Ph.D. dissertation, KU Leuven, Leuven, Belgium, 2014.

[21] S. Vandael, B. J. Claessens, D. Ernst, T. Holvoet, and G. Deconinck,

“Reinforcement learning of heuristic EV fleet charging in a day-ahead electricity market,” IEEE Trans. on Smart Grid, vol. 6, no. 4, pp. 1795–

1805, July 2015.

BIBLIOGRAPHY 131

[22] K. De Craemer, S. Vandael, B. J. Claessens, and G. Deconinck, “An event-driven dual coordination mechanism for demand side management of PHEVs,” IEEE Trans. on Smart Grid, vol. 5, no. 2, pp. 751–760, March 2014.

[23] C. Quinn, D. Zimmerle, and T. H. Bradley, “The effect of communication architecture on the availability, reliability, and economics of plug-in hybrid electric vehicle-to-grid ancillary services,” Journal of Power Sources, vol.

195, no. 5, pp. 1500–1509, 2010.

[24] F. Ruelens, S. Vandael, W. Leterme, B. J. Claessens, M. Hommelberg, T. Holvoet, and R. Belmans, “Demand side management of electric vehicles with uncertainty on arrival and departure times,” in Proc. 3th IEEE Innov.

Smart Grid Technol. Conf. (ISGT Europe), Berlin, Germany, Oct. 2012.

[25] M. Petersen, K. Edlund, L. Hansen, J. Bendtsen, and J. Stoustrup, “A taxonomy for modeling flexibility and a computationally efficient algorithm for dispatch in smart grids,” in American Control Conference (ACC), 2013, June 2013, pp. 1150–1156.

[26] B. Dupont, P. Vingerhoets, P. Tant, K. Vanthournout, W. Cardinaels, T. De Rybel, E. Peeters, and R. Belmans, “LINEAR breakthrough project:

Large-scale implementation of smart grid technologies in distribution grids,” in Proc. 3rd IEEE PES Innov. Smart Grid Technol. Conf. (ISGT Europe), Berlin, Germany, Oct. 2012, pp. 1–8.

[27] J. Mathieu, M. Dyson, D. Callaway, and A. Rosenfeld, “Using residential electric loads for fast demand response: The potential resource and revenues, the costs, and policy recommendations,” Proc. of the ACEEE Summer Study on Buildings, Pacific Grove, CA, 2012.

[28] F. Schweppe, M. Caramanis, and R. Bohn, “Optimal spot pricing: practice and theory,” IEEE Trans. on Power Apparatus and Systems, vol. 101, no. 9, pp. 3234–3245, 1982.

[29] B. J. Claessens, S. Vandael, F. Ruelens, and M. Hommelberg, “Self-learning demand side management for a heterogeneous cluster of devices with binary control actions,” in Proc. 3th IEEE Innov. Smart Grid Technol.

Conf. (ISGT Europe), Berlin, Germany, Oct. 2012, pp. 1–8.

[30] B. J. Claessens, S. Vandael, F. Ruelens, K. De Craemer, and B. Beusen,

“Peak shaving of a heterogeneous cluster of residential flexibility carriers using reinforcement learning,” in Proc. 4th IEEE Innov. Smart Grid Technol. Conf. (ISGT Europe), Copenhagen, Denmark, Oct. 2013, pp.

1–5.

132 BIBLIOGRAPHY

[31] S. Iacovella, F. Geth, F. Ruelens, N. Leemput, P. Vingerhoets, G. Deconinck, and B. J. Claessens, “Double-layered control methodology combining price objective and grid constraints,” in Proc. IEEE Int. Conf.

on Smart Grid Commun. (SmartGridComm), Vancouver, BC, Canada, Oct. 2013, pp. 25–30.

[32] F. Ruelens, S. Weckx, W. Leterme, S. Vandael, B. J. Claessens, and R. Belmans, “Stochastic portfolio management of an electric vehicles aggregator under price uncertainty,” in Proc. 4th IEEE Innov. Smart Grid Technol. Conf. (ISGT Europe), Copenhagen, Denmark, Oct 2013, pp. 1–5.

[33] F. Ruelens, B. J. Claessens, S. Vandael, B. De Schutter, R. Babuska, and R. Belmans, “Residential demand response of thermostatically controlled loads using batch reinforcement learning,” IEEE Trans. on Smart Grid, vol. PP, no. 99, pp. 1–11, 2016.

[34] W. Leterme, F. Ruelens, B. J. Claessens, and R. Belmans, “A flexible stochastic optimization method for wind power balancing with PHEVs,”

IEEE Trans. on Smart Grid, vol. 5, no. 3, pp. 1238–1245, May 2014.

[35] S. Iacovella, F. Ruelens, P. Vingerhoets, B. J. Claessens, and G. Deconinck,

“Cluster control of heterogeneous thermostatically controlled loads using tracer devices,” IEEE Trans. on Smart Grid, vol. PP, no. 99, pp. 1–9, 2015.

[36] G. Costanzo, S. Iacovella, F. Ruelens, T. Leurs, and B. Claessens,

“Experimental analysis of data-driven control for a building heating system,” Sustainable Energy, Grids and Networks, vol. 6, pp. 81 – 90, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/

pii/S2352467716000138

[37] D. Ernst, P. Geurts, and L. Wehenkel, “Tree-based batch mode reinforcement learning,” Journal of Machine Learning Research, pp. 503–

556, 2005.

[38] R. Fonteneau, S. A. Murphy, L. Wehenkel, and D. Ernst, “Batch mode reinforcement learning based on the synthesis of artificial trajectories,”

Annals of Operations Research, vol. 208, no. 1, pp. 383–416, 2013.

[39] L. Busoniu, D. Ernst, R. Babuška, and B. De Schutter, “Exploiting policy knowledge in online least-squares policy iteration: An empirical study,”

Automation, Computers, Applied Mathematics, vol. 19, no. 4, 2010.

[40] S. Lange and M. Riedmiller, “Deep auto-encoder neural networks in reinforcement learning,” in Proc. IEEE 2010 Int. Joint Conf. on Neural Networks (IJCNN), Barcelona, Spain, July 2010, pp. 1–8.

BIBLIOGRAPHY 133

[41] D. Ernst, M. Glavic, F. Capitanescu, and L. Wehenkel, “Reinforcement learning versus model predictive control: a comparison on a power system problem,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 39, no. 2, pp.

517–529, 2009.

[42] R. De Coninck, R. Baetens, D. Saelens, A. Woyte, and L. Helsen,

“Rule-based demand-side management of domestic hot water production with heat pumps in zero energy neighbourhoods,” Journal of Building Performance Simulation, vol. 7, no. 4, 2014.

[43] F. Oldewurtel, “Stochastic Model Predictive Control for Energy Efficient Building Climate Control,” Ph.D. dissertation, ETH Zurich, Zurich, Switzerland, 2011.

[44] L. Busoniu, D. Ernst, B. De Schutter, and R. Babuška, “Online least-squares policy iteration for reinforcement learning control,” in Proc. IEEE American Control Conference (ACC), 2010, pp. 486–491.

[45] E. F. Camacho and C. Bordons, Model Predictive Control, 2nd ed. London, UK: Springer London, 2004.

[46] C. Verhelst, “Model predictive control of ground coupled heat pump systems for office buildings,” Ph.D. dissertation, KU Leuven, Leuven, Belgium, 2015.

[47] R. De Coninck, “Grey-box based optimal control for thermal systems in buildings,” Ph.D. dissertation, KU Leuven, Leuven, Belgium, 2015.

[48] L. Ljung, System Identification. New-York City, US: Springer, 1998.

[49] S. Koch, J. L. Mathieu, and D. S. Callaway, “Modeling and control of aggregated heterogeneous thermostatically controlled loads for ancillary services,” in Proc. 17th IEEE Power Sys. Comput. Conf. (PSCC), Stockholm, Sweden, Aug. 2011, pp. 1–7.

[50] J. Mathieu and D. Callaway, “State estimation and control of heterogeneous thermostatically controlled loads for load following,” in Proc. 45th Int. Conf. on System Science, Maui, HI, US, Jan. 2012, pp.

2002–2011.

[51] J. Cigler, D. Gyalistras, J. Širok`y, V. Tiet, and L. Ferkl, “Beyond theory:

the challenge of implementing model predictive control in buildings,” in Proc. 11th REHVA World Congress, Czech Republic, Prague, 2013.

[52] Y. Zhu, Multivariable System Identification for Process Control. Oxford, UK: Elsevier, 2001.

134 BIBLIOGRAPHY

[53] R. Bellman, Dynamic Programming. New York, NY: Dover Publications, 1957.

[54] W. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. Hoboken, NJ: Wiley-Blackwell, 2011.

[55] D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Nashua, NH: Athena Scientific, 1996.

[56] Neurobat, “Neurobat interior climate technologies,” http://www.neurobat.

net/de/home/, [Online: accessed March 21, 2015].

[57] M. Deisenroth and C. E. Rasmussen, “PILCO: A model-based and data-efficient approach to policy search,” in Proc. of the 28th International Conference on machine learning (ICML-11), Bellevue, WA, US, 2011, pp.

465–472.

[58] D. Urieli and P. Stone, “A learning agent for heat-pump thermostat control,” in Proc. 12th Int. Conf. on Autonomous Agents and Multi-agent Systems (AAMAS), Saint Paul, MN, US, May 2013, pp. 1093–1100.

[59] N. Morel, M. Bauer, M. El-Khoury, and J. Krauss, “Neurobat, a predictive and adaptive heating control system using artificial neural networks,”

International Journal of Solar Energy, vol. 21, no. 2-3, pp. 161–201, 2001.

[60] C. J. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no.

3-4, pp. 279–292, 1992.

[61] G. Rummery and M. Niranjan, Online Q-learning Using Connectionists Systems. Cambridge, UK: Cambridge University, 1994.

[62] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.

Cambridge, MA: MIT Press, 1998.

[63] D. O’Neill, M. Levorato, A. Goldsmith, and U. Mitra, “Residential demand response using reinforcement learning,” in Proc. 1st IEEE Int. Conf. on Smart Grid Commun. (SmartGridComm), Gaithersburg, Maryland, US, Oct. 2010, pp. 409–414.

[64] G. P. Henze and J. Schoenmann, “Evaluation of reinforcement learning control for thermal energy storage systems,” HVAC&R Research, vol. 9, no. 3, pp. 259–275, 2003.

[65] E. C. Kara, M. Berges, B. Krogh, and S. Kar, “Using smart devices for system-level management and control in the smart grid: A reinforcement learning framework,” in Proc. 3rd IEEE Int. Conf. on Smart Grid Commun. (SmartGridComm), Tainan, Taiwan, Nov. 2012, pp. 85–90.

BIBLIOGRAPHY 135

[66] D. Lee and W. B. Powell, “An intelligent battery controller using bias-corrected Q-learning,” in Association for the Advancement of Artificial Intelligence, J. Hoffmann and B. Selman, Eds. Palo Alto, CA: AAAI Press, 2012.

[67] Y. Liang, L. He, X. Cao, and Z.-J. Shen, “Stochastic control for smart grid users with flexible demand,” IEEE Trans. on Smart Grid,, vol. 4, no. 4, pp. 2296–2308, Dec 2013.

[68] H. V. Hasselt, “Double Q-learning,” in Proc. 24th Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 2010, pp. 2613–2621. [Online]. Available: http://papers.nips.cc/paper/

3964-double-q-learning.pdf

[69] S. Adam, L. Busoniu, and R. Babuška, “Experience replay for real-time reinforcement learning control,” IEEE Trans. on Syst., Man, and Cybern., Part C: Applications and Reviews, vol. 42, no. 2, pp. 201–212, 2012.

[70] D. Ormoneit and Ś. Sen, “Kernel-based reinforcement learning,” Machine learning, vol. 49, no. 2-3, pp. 161–178, 2002.

[71] S. Lange, T. Gabel, and M. Riedmiller, “Batch reinforcement learning,” in Reinforcement Learning: State-of-the-Art, M. Wiering and M. van Otterlo, Eds. New York, NYC: Springer, 2012, pp. 45–73.

[72] L.-J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Machine Learning, vol. 8, no. 3-4, pp. 293–321, 1992.

[73] L. Busoniu, R. Babuška, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators. Boca Raton, FL: CRC Press, 2010.

[74] Z. Wen, D. O Neill, and H. Maei, “Optimal demand response using device-based reinforcement learning,” [Online: accessed March 21, 2015].

Available: http://web.stanford.edu/class/ee292k/reports/ZhengWen.pdf, Stanford University, Yahoo Labs, Stanford, CA, Tech. Rep., 2015.

[75] F. Ruelens, B. J. Claessens, S. Vandael, S. Iacovella, P. Vingerhoets, and R. Belmans, “Demand response of a heterogeneous cluster of electric water heaters using batch reinforcement learning,” in Proc. 18th IEEE Power Sys. Comput. Conf. (PSCC), Wrocław, Poland, 2014, pp. 1–8.

[76] R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Proc. of the 7th international conference on machine learning, 1990, pp. 216–224.

136 BIBLIOGRAPHY

[77] T. Lampe and M. Riedmiller, “Approximate model-assisted neural fitted Q-iteration,” in Proc. 2014 International Joint Conference on Neural Networks (IJCNN), July 2014, pp. 2698–2704.

[78] F. Oldewurtel, A. Parisio, C. N. Jones, M. Morari, D. Gyalistras, M. Gwerder, V. Stauch, B. Lehmann, and K. Wirth, “Energy efficient building climate control using stochastic model predictive control and weather predictions,” in in Proc. IEEE American control conference (ACC), 2010, 2010, pp. 5100–5105.

[79] J. L. Mathieu, M. Kamgarpour, J. Lygeros, and D. S. Callaway, “Energy arbitrage with thermostatically controlled loads,” in Proc. 2013 European Control Conference (ECC). IEEE, 2013, pp. 2519–2526.

[80] M. Maasoumy, M. Razmara, M. Shahbakhti, and A. Sangiovanni Vincen-telli, “Selecting building predictive control based on model uncertainty,”

in Proc. American Control Conference (ACC), Portland, OR, June 2014, pp. 404–411.

[81] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. New York, NY: Wiley-Interscience, 1994.

[82] R. Howard, Dynamic Programming and Markov Processes. The MIT Press, Cambridge, Massachusetts., 1960.

[83] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1, pp. 99–134, 1998.

[84] D. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA, US: Athena Scientific, 1995.

[85] D. P. Bertsekas, “Dynamic programming and optimal control 3rd edition, volume ii: Chapter 6 approximate dynamic programming,” 2011. [Online].

Available: http://web.mit.edu/dimitrib/www/dpchapter.pdf

[86] L. Busoniu, D. Ernst, B. De Schutter, and R. Babuška, “Cross-entropy optimization of control policies with adaptive basis functions,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 41, no. 1, pp. 196–209, 2011.

[87] S. P. Singh and R. S. Sutton, “Reinforcement learning with replacing eligibility traces,” Machine Learning, vol. 22, no. 1-3, pp. 123–158, 1996.

[88] S. Thrun, “The role of exploration in learning control,” in Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, D. White and D. Sofge, Eds. Florence, Kentucky 41022: Van Nostrand Reinhold, 1992.

BIBLIOGRAPHY 137

[89] M. A. Wiering, “Explorations in efficient reinforcement learning,” Ph.D.

dissertation, University of Amsterdam, Amsterdam, The Netherlands, 1999.

[90] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning:

A survey,” Journal of Artificial Intelligence Research, pp. 237–285, 1996.

[91] J. N. Tsitsiklis and B. Van Roy, “An analysis of temporal-difference learning with function approximation,” IEEE Trans. on Automatic Control, vol. 42, no. 5, pp. 674–690, 1997.

[92] S. J. Bradtke and A. G. Barto, “Linear least-squares algorithms for temporal difference learning,” Machine Learning, vol. 22, no. 1-3, pp.

33–57, 1996.

[93] R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, “Fast gradient-descent methods for temporal-difference learning with linear function approximation,” in Proc. 26th Annual International Conference on Machine Learning (ACM), 2009, pp. 993–

1000.

[94] D. E. Rumelhart, J. L. McClelland, and PDP Research Group and others, Parallel Distributed Processing. MA, US: MIT Press, 1986.

[95] C. M. Bishop, Neural networks for pattern recognition. Oxford university press, 1995.

[96] Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol.

3361, no. 10, p. 1995, 1995.

[97] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov 1997.

[98] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.

Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al.,

“Human-level control through deep reinforcement learning,” Nature, vol.

518, no. 7540, pp. 529–533, 2015.

[99] M. Riedmiller, “Neural fitted Q-iteration–first experiences with a data efficient neural reinforcement learning method,” in Proc. 16th European Conference on Machine Learning (ECML), vol. 3720. Porto, Portugal:

Springer, Oct. 2005, p. 317.

[100] Y. Engel, S. Mannor, and R. Meir, “Reinforcement learning with gaussian processes,” in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 201–208.

138 BIBLIOGRAPHY

[101] C. E. Rasmussen, “Gaussian processes for machine learning,” 2006.

[102] L. Breiman, J. Friedman, R. Olshen, C. Stone, D. Steinberg, and P. Colla,

“Cart: Classification and regression trees,” Wadsworth: Belmont, CA, vol.

156, 1983.

[103] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp.

123–140, 1996.

[104] ——, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

[105] P. Geurts, “Regression tree package.” [Online]. Available: http:

//www.montefiore.ulg.ac.be/~geurts/Software.html

[106] G. J. Gordon, “Stable function approximation in dynamic programming,”

in Proceedings of the twelfth international conference on machine learning, 1995, pp. 261–268.

[107] D. Ormoneit and P. Glynn, “Kernel-based reinforcement learning in average-cost problems,” IEEE Trans. on Automatic Control, vol. 47, no. 10, pp. 1624–1636, 2002.

[108] J. N. Tsitsiklis and B. Van Roy, “Feature-based methods for large scale dynamic programming,” Machine Learning, vol. 22, no. 1-3, pp. 59–94, 1996.

[109] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”

Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987.

[110] W. Curran, T. Brys, M. Taylor, and W. Smart, “Using PCA to efficiently represent state spaces,” in The 12th European Workshop on Reinforcement Learning (EWRL 2015), Lille, France, 2015.

[111] C. H. K. Goh and J. Apt, “Consumer strategies for controlling electric water heaters under dynamic pricing,” Proc. Carnegie Mellon Elect. Ind.

Center Working Paper CEIC-04, pp. 1–8, 2004.

[112] D. P. Bertsekas, “Approximate dynamic programming,” in Dynamic Programming and Optimal Control, 3rd ed. Belmont, MA: Athena Scientific, 2011, pp. 324–552.

[113] I. Richardson, M. Thomson, and D. Infield, “A high-resolution domestic building occupancy model for energy demand simulations,” Energy and Buildings, vol. 40, no. 8, pp. 1560–1566, 2008.

BIBLIOGRAPHY 139

[114] U.S. Department of Energy, “Limitations for homes with heat pumps, electric resistance heating, steam heat, and radiant floor heating,” http:

//energy.gov/energysaver/articles/thermostats, [Online: accessed March 21, 2015].

[115] R. Halvgaard, J. B. Jørgensen, and L. Vandenberghe, “Dual decomposition for large-scale power balancing,” in 18th Nordic Process Control Workshop, 2013.

[116] A. Molderink, V. Bakker, M. Bosman, J. Hurink, and G. Smit,

“Management and control of domestic smart grid technology,” IEEE Trans. on Smart Grid, vol. 1, no. 2, pp. 109–119, Sept. 2010.

[117] Gurobi Optimization, “Gurobi optimizer reference manual,” http://www.

gurobi.com/, [Online: accessed March 21, 2015].

[118] ILOG, Inc, “ILOG CPLEX: High-performance software for mathematical programming and optimization,” 2006, see http://www.ilog.com/

products/cplex/.

[119] M. Scholz and R. Vigário, “Nonlinear PCA: a new hierarchical approach.”

in ESANN, 2002, pp. 439–444.

[120] “Belpex - Belgian power exchange,” http://www.belpex.be/, [Online:

[120] “Belpex - Belgian power exchange,” http://www.belpex.be/, [Online: