ContentslistsavailableatScienceDirect
European Journal of Operational Research
journalhomepage:www.elsevier.com/locate/ejor
Analytics, Computational Intelligence and Information Management
Prescriptive selection of machine learning hyperparameters with applications in power markets: Retailer’s optimal trading
Alberto Corredera, Carlos Ruiz
∗Department of Statistics & UC3M-BS Institute for Financial Big Data (IFiBiD), University Carlos III de Madrid, Avda. de la Universidad, 30, Leganés 28911, Spain
a rt i c l e i nf o
Article history:
Received 22 November 2021 Accepted 9 November 2022 Available online 15 November 2022 Keywords:
OR in energy Data-driven Electricity retailer Hyperparameter selection Machine learning
a b s t r a c t
Wepresentadata-drivenframeworkforoptimal scenarioselectioninstochasticoptimizationwithap- plicationsinpowermarkets.Theproposedmethodologyreliesontheexistenceofauxiliaryinformation andtheuseofmachinelearningtechniquestonarrowthesetofpossiblerealizations(scenarios)ofthe variables ofinterest. In particular, weimplement anovel validation algorithmthat allowsoptimizing eachmachinelearninghyperparametertofurtherimprovetheprescriptivepoweroftheresultingsetof scenarios.Supervisedmachinelearningtechniquesareexamined,includingkNNanddecisiontrees,and thevalidationprocessisadaptedtoworkwithtime-dependentdatasets.Moreover,weextendthepro- posedmethodologytoworkwithunsupervisedtechniqueswithpromisingresults.Wetesttheproposed methodologyinarealisticpowermarketapplication:optimaltradingstrategyinforwardandspotmar- ketsforanelectricityretailerunderuncertainspotprices.Theresultsindicatethattheretailercangreatly benefitfromtheproposeddata-drivenmethodologyandimproveitsmarketperformance.Moreover,we performanextensivesetofnumericalsimulationstoanalyzeunderwhichconditionsthebestmachine learninghyperparameters,intermsofprescriptiveperformance,differfromthosethatprovidethebest predictiveaccuracy.
© 2022 The Author(s). Published by Elsevier B.V.
ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/)
1. Introduction
Weareenteringafourthtechnologicalrevolutioncharacterized by theautomationanddigitizationofindustrialprocesses,andby a moreefficientandsustainableallocation ofresources.Thereare varioustechnologiesthatdrivethisdevelopment(e.g.theinternet ofthings,cyber-physicalsystems,smartsensors,cloudcomputing, etc.) and that enable the generation, the efficient collection and processing, and the analysis of large volumes of different types of data (Big Data). In this context, the field of decision-making under uncertaintyhasthe opportunityto leveragefromthisdata availability to face important challenges (e.g. pandemic manage- ment,productionallocation,investmentinrenewabletechnologies, personalizedmedicaltreatments, demandresponse inpowersys- tems,etc.).However,traditionaldecision-makingtechniques,based onstochasticorrobustoptimizationproblems,arenotdesignedto take advantageofthe fullpotential thatthesenewdatasetsoffer.
Therefore,itisnecessarytoimproveandadaptthesetechniquesto
∗Corresponding author.
E-mail address: [email protected] (C. Ruiz) .
benefitfromaricherempiricalcharacterizationoftheuncertainty associatedwiththemodel.
Traditionally, deterministic optimization techniques have been used to tackle complex decision-making problems in different fields of application (Murty, 1994), e.g. allocation of schedules, management ofproductionsystems,organization ofairlines,pric- ingsystems,etc.Thesetechniquesencompasslinear,nonlinear,in- teger optimization, ora combinationof these, andare based on a fundamental hypothesis: the input parameters of the problem are known with complete certainty. However, this hypothesis is rarely fulfilledin real contexts. Tosolve this problem, newtech- niqueshavebeendevelopedthatincorporatetheuncertaintyasso- ciated withtheproblem parameters.One ofthe mostusedtech- niques is stochastic programming, where the model incorporates theestimatedprobabilitydistributionoftheuncertainparameters, eitheranalyticallyorthroughascenario-type discretization(Birge
&Louveaux,2011).Itisaveryversatiletechniqueformodelingre- alistic problems; however, it has some drawbacks such as: (i) A highsensitivityofthesolutionto thechosenprobability distribu- tion(Römisch,2003),(ii)The computationalcomplexityincreases exponentiallywiththe sizeofthemodels (Shapiro &Nemirovski,
https://doi.org/10.1016/j.ejor.2022.11.011
0377-2217/© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
2005)and(iii)Itdoesnottypicallytakeintoaccounttheinforma- tionprovidedbyauxiliaryvariables(Bertsimas&Kallus,2020).
Specifically,stochasticprogrammingconsiderstheproblem:
minz∈Z E[c
(
z;Y)
] (1)where z∈Z⊂ Rdz are thedecision variables,Y∈Y⊂ Rdy are the parameters that characterize the problem, c(z;Y):Rdz× Rdy→R is the cost function andE[·] represents the expected value over the Y distribution.In general, the probability distribution ofY is unknown (Nemirovski&Shapiro,2007),althoughhistoricalobser- vations of this variable
Y1,Y2,...,YN
, are available, and thus its empirical distribution can be reconstructed. For this reason, theSampleAverageApproximation(SAA)approachisusuallyused withthefollowingformulation(Shapiro&Nemirovski,2005):
minz∈Z
1 N
N
i=1
c
z;Yi
. (2)
In thisformulation, thetheoretical expectedvalue isreplaced by the meancalculatedontheempirical distributionwhereeachre- alization(scenario)oftheparameterYiisassignedaprobabilityof
1 N.
However,inpracticeitispossibletohavehistoricalseriesofthe parameters ofinterest(Y),togetherwithauxiliaryparameters(X), i.e., covariates, that can help to improve their probabilistic char- acterization.Forthisreason, forsomeyearsnow, ithasbeenpro- posedtoaddresstheoptimizationproblemfromadata-drivenper- spective,where ideasof statisticsandMachine Learning(ML)are combined withmathematicaloptimization(Keith& Ahner,2021).
Hence,wemayconsidernowthissetting:
z
(
x)
∈argminz∈Z E[c
(
z;Y|
X=x)
]where optimal decisions z(x) depend on auxiliary information x, which is assumed to be known at the time of decision making, and whichcan havea highimpact onthe uncertaintyassociated withY.
Mostpracticalimplementationsofthisproblemareapproached by two disjoint stages: (i) Predict: use available databases (
(x1,y1),...,(xN,yN)
)totrainpredictivemodelsy= f(x),based onMLortraditionaltimeseriestechniques(James,Witten,Hastie,
& Tibshirani, 2013), forthe parameters that the decision-making model needs and (ii) Optimize: Use the predictive input to ob- tain optimal solutions. However, it has been observed how this approach maybe sub-optimal,asit doesnot adequatelyquantify how the uncertainty of the predictions can impact the objective functionofthedecision-makingproblem.
Motivatedbythisfact,aseriesofresearchworkshaverecently addressed these difficulties in order to efficiently integrate pre- diction andprescriptioninthecontext ofstochastic programming (Mundru,2019).Theproblemtobe treatedcanbegeneralizedas:
z
(
f,x)
∈argminz∈Z E[c
(
z;Y|
f, X=x)
] (3)wherethedecisionvariables(z)andtheestimatedprobabilitydis- tributionofY dependbothontheauxiliaryinformationx,andon thechosenpredictivefunction f(x).Now,thechoiceof f(x)canbe assessed from the improvement ofdecision-making process, and notonlybyminimizingapredictionerror.
Various authors have addressed different versions of problem (3) or its surrogate via empirical SAA. In particular we can dif- ferentiate thoseapproacheswherepredictionandprescriptionare developed in two differentiated stages, and those with an inte- grated perspective. Regarding the former, Tulabandhula & Rudin (2013) proposea two-stepprocedure toselectapredictivemodel with good prescriptive performance. First, the predictive model
is selected by including a regularization term in the loss func- tionof thelearningproblem, accountingfortheoperational cost.
Second,the optimalpolicies(minimum operationalcosts)arede- rived by usingthe previous predictive model.Bertsimas & Kallus (2020) propose first to use supervised non-parametric ML tools (kNN,RandomForest, etc.)toselectpotential realizations(scenar- ios)oftheresponsevariables,givenimperfectobservations(auxil- iarydata).Then, aconditionalstochasticoptimizationproblemsis usedtoderivethebestprescriptivepolicy.
Considering integrated approaches, Donti, Amos, & Kolter (2017) propose to find the optimal parameters of the model pa- rameters based ona prescriptive-based loss function. Due to the potentialnon-convexnatureofthisfunction,aniterativestochastic gradientdescent approachis proposedto findlocalsolutions.For problems where uncertainty is present in the objective function parameters (linear),Elmachtoub& Grigas(2021)propose aninte- grated predictive-prescriptiveframework (SmartPredictthen Opti- mize) wherethe loss functionused to train thepredictive model explicitly accounts for the prescription error. To overcome com- putationalchallengesduetonon-convexities,atractablealgorithm isproposedbasedon arelatedconvexlossfunction.Ban& Rudin (2019)proposeintegratedalgorithmsbasedonempiricalriskmin- imization (ERM) andkernel-weights optimization (KO), anddeci- sion rules that directly link, via a predefined functional model, the covariates with the decision variables. These algorithms are testedinthenewsvendorproblem.Inthesameapplication,Huber, Müller, Fleischmann, & Stuckenschmidt (2019) evaluate numeri- cally,andunderdifferentMLandtimeseriestechniques,whenin- tegratingpredictionandprescriptionoutperformstraditionalfirst- predict-then-optimize approaches. Mundru (2019) consider deci- sion models with auxiliary covariate data where the ML model is trained to improve the prescriptive performance while penal- izing the uncertainty associated with the predictions. Gupta &
Rusmevichientong (2021) introduce an approach focused on lin- earoptimizationproblemswhosedataavailabilityissmallbutyet adequate to describe the uncertainty. Muñoz, Pineda, & Morales (2022) introduce a bilevel approach by obtaining a parametric model, based on decision rules, that integrates the ML problem characterizationintothetargetoptimizationproblem.
Moreover, due to their interpretability and scalability prop- erties, some recent works have focused in developing related integrated frameworks for tree-based algorithms. In particular, Bertsimas, Dunn, & Mundru (2019) generalize previous works on tree-based algorithms so that training is performed with a loss function that balances both the predictive and prescriptive per- formance.The methodology isadapted to generateboth constant (mean outcomes in each leaf) and linear (elastic net model in each leaf) predictions. In the same vein, Stratigakos, Camal, Mi- chiorri,&Kariniotakis(2022)proposeanothertree-basedmethod- ology which focuses on learning a policy conditioned on covari- ate data.The aim is to use this policy to take optimal decisions basedonaweightedSAAframework,similartoBertsimas&Kallus (2020).
Ourworkcanbe viewedasabridgebetweenthesetwotypes of approaches.In particular, we address the conditional stochas- ticoptimization problem(3) witha ML-based scenario selection, which is basedon the work of Bertsimas & Kallus(2020).How- ever, given a ML predictive function f , we acknowledge its de- pendence withrespecttosome hyperparameters (e.g.,numberof neighborsinkNN,depthofadecisiontree,numberofcentroidsin K-means,etc.) thatneed to be fixedbeforehand. Theseare tradi- tionallytunedup basedonpredictive performanceandvalidation techniques(Jamesetal.,2013).However,inthiswork wepropose to extendthis setting froma prescriptive point of view in order toselect thebest learningmodelover avalidation set.The main contributionsofthisworkarefivefold:
(i) Topropose aprescriptive-basedvalidationschemeto select optimal ML hyperparameters for scenario-weighted condi- tionalstochasticproblems.
(ii) Tostudyhowthisschemecanresultinsubstantiallydiffer- ent hyperparameters’values withabetter prescriptive per- formance,ifcomparedtothetraditionalpredictiveapproach.
(iii)Toillustrate,throughanextensivenumericalanalysis,which are themainfactors driventhesedifferences:samplesizes, marketconditions,performancemetricsandMLtechniques.
(iv)To extend this validation and approach with unsupervised MLtechniquesandwithtimeseriesdatasets.
(v) Totesttheproposedframework inarealworlddata-driven probleminthecontextofelectricitymarkets.
Asindicated,wetesttheperformanceoftheproposedmethod- ology in a real-world applicationbased on the medium-termre- tailer problem defined by Conejo, Carrión, & Morales (2010). In particular, weanalyzethe problemfaced byan electricityretailer that seeksto derive its optimal procurementstrategy via futures andspotmarkets,together withtheappropriate selectionofare- tailpricetariff forits clients(consumers).Theretailerhaveaccess toseveralyearsofhistoricalrecordsofhourlydata,includingspot prices anddemand loads. Some of these variables have a direct impact onits decisionproblem(i.e.,Y),whileotherscan beused as auxiliary information(i.e., X)which is knownat thetime the decision making takesplace.We will evaluate differentML tech- niques underdifferentmarketconditionstoexplorethemainfac- torsdriventheretailersprofit.
2. Prescriptivealgorithmdescription
Recent data-driven methods leverage the solution process on thedataitselftoaccountforanadequatealgorithmsetupandfind an optimal solution of the stochastic problem (3). They also try toovercomethelimitationsoftraditionalapproachessuchasSAA andpointpredictionmethodsincommonoperations/management settings.The formerdoesnot guarantee,underfinitesamplecon- ditions, anadequate asymptoticperformance andtractability,and the latterhas poorperformance when the samplesize increases, reaching sub-optimal decisions. In particular, Bertsimas & Kallus (2020) propose to estimate the conditional stochastic problem (3) by a SAA-based formulation, where the weights assigned to eachsamplearederivedfromapredictiveMLmethod:
ˆ
zN
(
x)
∈argminz∈Z
N
i=1
wN,i
(
x)
c(
z,yi)
, (4)wherezˆN corresponds totheoptimaldecisions tobemadebased on theinformation available ataspecific point intime, i.e., SN=
{
(x1,y1),...,(xN,yN)}
, given that only a subset of the covari-ates x=
{
x1,...,xN} |
X∈X⊂ Rdx and target uncertain variables y={
y1,...,yN} |
Y∈Y⊂ Rdy are available at scenario i. Scenario weightswN,iarederivedfromtheMLalgorithmandusedincom- binationwiththecostfunctionc(z,yi)toapproachtheoptimalde- cisionz∗.Notethat inthiscase, z|
Z∈Z⊂ Rdz correspondstothe decisionvariablesthatareconditionedtosomeinformationonx.OurmaincontributionistoaddressthecriticalMLissueofse- lecting theappropriate hyperparameterlayoutfromaprescriptive pointofview,ratherthantraditionalpredictiveapproaches.More- over,weseektotesttheusefulnessoftheprescriptivemethodde- scribedaboveinarealworldsetting.Inparticular,weconsiderthe problem:
ˆ
zN
(
x; k)
∈argminz∈Z
N
i=1
wN,i
(
x; k)
c(
z,yi)
, (5)whereweexplicitlyaccountfortheimpactoftheMLhyperparam- eters k∈K in the optimaldecisionzˆN.We seektofind thevalue
ofkthatrendersthebestprescriptiveperformanceonavalidation set,differentfromtheset usedtotraintheML modelthat deter- minestheweightswN,i.However,formanyrelevantMLtechniques, thefunctionalrelationshipbetweentheseweightsandkishighly nonlinearandnonconvex,sothistaskcannotbeaddressedanalyt- ically. Hence, we propose Algorithm1, asa newproblemvalida- tion framework.Moreover, the proposed methodology isalso ap- plicable to data that presents time dependent patterns, andthat requireand special treatment, compared to traditional validation approaches.
To illustrate the proposed methodology, lets assume that we workwithkNN(k-NearestNeighbor)astheMLtechniquetoselect meaningfulscenariosbasedoncontextualinformation.Weseekto obtaintheappropriatevalueofk,i.e., numberofneighbors,based onavalidationprocedure.Hence,weproposetheimplementation ofthefollowingdata-drivenalgorithm:
Algorithm1:DataDrivenOptimizationAlgorithm.
Input:X,Y,K Output:zˆ∗N,k∗
1 fork∈Kdo
2 Fit ML f orSN=
{
X,Y}
and obtain regionsR(x; k)3 for j=1 to Nvdo
4 Get R(x; k) f or
{
xj∈SNvwhereR(x; k)=R(xj; k)}
;wN,i(x; k)=1
kI[xiisinregionR(x; k)]
5 Sol
v
ezˆN(k)∈argminz∈Z
N i=1
wN,i(x; k)c(z,yi)
6 end~for
7 MAEk= N1v Nv
j=1
minz∈Z(c(z,yj))− c(zˆN(xj; k),yj);8 end~for
9 Pickk∗ ⇒MAEk∗ ≤ MAEk
∀
k∈K10 Pickzˆ∗N⇒k=k∗
11 returnzˆ∗N,k∗
where SNv=
(x1,y1),...,(xNv,yNv)
is a validation set so that SN∩SNv=∅, K is the hyperparameter k’s domain under explo- ration,andRamapproposedbytheMLsothatX=∪Mm=1R(m)−1. The proposed Algorithm 1 follows 2 steps. First, we pick a hyperparameter value k∈K, and generate a map R(x; k) of the trainingSN by fitting theML algorithm. Then we assign weights wN,i(x; k),toevery pointoftheregionatwhichtheMLalgorithm assignsthetargetpoint(xi)ofthevalidationsetSNv,andsolvethe problemaccording to (4). In the second step, the MAE for every k∈K is calculated andevaluated toselect the optimalk∗ asthe one renderingthe lowest MAE, withtheassociated optimaldeci- sion zˆ∗N. Notethat the domain k∈K canbe explored by a grid- searchtechnique.
Regardingperformancemetrics,Bertsimas&Kallus(2020)pro- pose to focus on the final output of optimization itself, using a loss function of the optimization problem that they denote “Co- efficientofPrescriptiveness” (P).Inparticular,thecostoftheper- fectforecast solutionisusedasa referenceto determinethedis- tancetotheperfectinformationsolutionoftheproposedprescrip- tivemethod,anditiscomparedtothedistanceoftheSAAsolution costintheformofaratio(6).
P=1−
(
RˆNv(
zˆN)
− ˆR∗Nv
)
/(
RˆNv(
zNSAA)
− ˆR∗Nv
)
(6)where RˆNv(zˆN) is the expected cost under the prescription algo- rithmapproach,RˆNv(zSAAN )istheestimatedexpectedcostusingSAA andRˆ∗N
v istheperfect-foresightexpectedcost.Itshouldbenoticed thatzSAAN iscomputedfollowingtheoriginaldefinitionoftheprob-
lemandsolved similarlyto(2)fora givensampleSˆN ofsizeN:
ˆ
zSAAN ∈argmin
z∈Z
1 N
N
i=1
c
(
z; yi)
(7)To interpret this measure we have first to consider that is boundedabove by 1.ValuesofP closeto1can beinterpreted as an increaseinqualityofthesolutionwithrespecttothestandard SAAapproach.Thisindicatesthatthescenarioweightsprovidedby thealgorithm,i.e.,theinformationtransferencefromtheMLalgo- rithm to solve theoptimization problem, improveswith thepre- scriptive approach. Low values of this measure, should be inter- pretedaspoorinformationtransference,withlimN→∞P=0.
Asindicated,weextendtheBertsimas&Kallus(2020)approach byforcingtheMLhyperparameterstobeselectedaccordingtothe prescriptiveperformance,ratherthanthepredictiveone.Then,we compare the errors obtained by both validation processes. From Algorithm1,wecanobservethatthelossfunctionweproposeas an alternative to (6)is the MAE (Mean Absolute Error), whichis definedby(8):
MAE= 1 Nv
Nv
j=1
minz∈Z
(
c(
z,yj))
− c(
zˆN(
xj)
,yj)
(8)The first term within the summation is the cost of the perfect- foresightinformationproblem,wherebythesecondtermisthees- timatedcostoftheproposedprescription.Thus,weseektousean absolute measure of the prescriptive error ratherthan a relative measurewithrespecttotheSAAcost.
The useof MAEseeks todirectly comparethe performance of the proposed algorithmwithrespect toa deterministicapproach, as we will further explain in the analysis of the algorithm set up. This is an important feature, since during the hyperparame- ter selection process,andunderatraditionalvalidationapproach, thedeterministicpredictivesolutionerrordeterminesthebestpa- rameterlayout.Thus,webelievethatconventionalerrormeasures in theML field are alsoappropriate to make theabove compari- son. Furthermore,since MAEwasalso employed tofeedthe vali- dation processover theestimationstep,wealsoincorporatedthe same metric tocompute the prescriptionerror, avoidingcompar- isons withloss functionsthat holddifferentproperties.Neverthe- less, forcomparison purposes we havealso considered the met- ric P toasses theresults inthenumericalcasestudy(Section4).
In particular, we have observed that while both P and MAE pro- vide similar results intermsof optimalhyperparameterselection (see TableinAppendixA),thecomputationoftheformerimplies amuchhighercomputationalburden.
Toensurethevalidityofthisprescriptiveprocedure,someprop- ertiesmustbefulfilledby boththeoptimizationproblemandthe machine learningalgorithm. Regarding the optimizationproblem, the perfectinformationsolutionmust existandbe asymptotically optimal. Thereby, Bertsimas & Kallus (2020) gave three basic as- sumptionsfortheprescriptiveprocedure:
1. Existence. E[
|
c(z; y)|
]<∞ for every z∈Z, and given that Z∗(x)=∅foralmosteveryx.2. Continuity.Foranyz∈Z and
>0thereexist
δ
>0suchthat|
c(z; y)− cz; y
|
≤forallzwithz− zandy∈Y.
3. Regularity.Z isclosed,nonemptyandeither:
• Z isboundedor
• lim infz→∞infy∈Yc(z; y)>−∞ and for every x∈X, there existsDx⊂ Y suchthatlimz→∞(z; y)→∞uniformlyover y∈DxandP(y∈Dx
|
X=x)>0.Theoretical proof of these assumptions is only given by Bertsimas & Kallus (2020) for the kNN approach, although justi- ficationforothersupervisedlearningmethodsnotusedhere,such
as Kernel Methods or Local Linear Methods, is also provided. In addition to the assumptions previously stated, one ofthe condi- tionsassumedtofulfilltheserequirementsisthattheoptimization problemhastobeconvex.
Besidesthesethreeassumptionsrelatedtotheasymptoticopti- malityproperty,twootherissuesmustbeconsidered.Thefirstone isthefundamentalproblemofcausalinference,sincethedecisions z couldaffectthecostfunctionandnoteverypossibleoutcomeis observable, such asin price-demanddecisionproblems, resulting in unobservable cost functions c
z; yi
zi
that could differ from theobservedones.The secondone dealswiththepossibilitythat theproblemisstillill-definedsincetheremaybeunobserveddata inthe counterfactual. Toovercomethesetwo issues, Bertsimas &
Kallus(2020)proposethefollowingtwoadditionalassumptions:
4. Decomposition of Decision. For some decomposition z= (z1,z2)onlyz1∈Rdz1 affectsuncertainty,thatis,
Y
(
z1,z2)
=Yz1,z2
∀ (
z1,z2)
,z1,z2
∈Z
5. Ignorability.For every z∈Z,Y(z) is independent of Z condi- tionedonX.
Consideringthis,theprescriptiveproblemgeneralizesas:
z∗
(
x)
∈Z∗(
x)
=argminz∈Z E[c
(
z;Y(
z) |
X=x)
]In other words, as long as we include all aspects that affect decisionz to be taken underthe umbrellaof observable circum- stancesX,thereissufficientguaranteetoassumethattheidentifi- cabilityofcausaleffectsconditionisfulfilled(Rosenbaum&Rubin, 1983). It is relevant to note that we also adopt these same five assumptionsinthiswork.Inparticular,thefirst fourassumptions are still validunderourapproach, aswe onlyaffectthe scenario weightswN,i(x; k)andnotthecostfunctionc(z,yi).Moreover,ifk effectivelycontainsinformationrelatedtoY(z),andthealgorithm introducesthisinformationbyassessingR(x,k)
∀
k∈K,wecanas- sumethatAssumption5holdsunderourapproach.2.1. Illustrative example: understanding the algorithm behavior
Although we will usea full definitionof the retailerproblem to compare the Algorithm 1’s behavior under different ML ap- proaches,we illustrateinthissection somekey aspectsoftheal- gorithm performance underasimplified problemversion. Weas- sumearetailerthatneeds todecidewhichistheoptimalamount of energy (electricity) to buy to supply the demand of its con- sumers.Thiscanbedonethroughawholesalespot/poolmarket,or throughforward/futures contracts. Letsassume thatspotandfor- wardpricesareexogenousandthatconsumersdemandisinelastic topricevariations,sothattheignorabilityconditionisfulfilled.In thisexample, we fix the retailprice
λ
¯R to 80€ /MWh, although itwillbeconsidered asanotherdecisionvariableintheextended modelinSection4.Then,theproblemfacedbythisretailercanbe formulatedasfollows:maximize
QF,EtPω N
ω=1
π
ω NT
t=1
( λ
¯RE¯tRω−λ
PtωEtPω−f∈Ft
λ
FQFdt)
(9a)s.t. 0≤ QF≤ ¯Q (9b)
E¯tRω=EtPω+QFdt+EtPC,
∀
t,∀ ω
(9c)The objectivefunction (9a)corresponds to theexpectedprofit obtained by the retailer, where
π
ω≡ wN,i is the probability as- signed toevery scenarioω
=1,...,N andt=1,...,T is theset ofperiodsoverwhichtheprofitismaximized.Thepricesarerep- resentedbyλ
beingλ
¯R,λ
P andλ
F the retailer electricity sellingTable 1
Spanish market daily average spot prices and demand, first week of August 2020.
Spot Prices Real Demand
Day [ € / MWh] [MWh]
1 33.0176 28,830.9
2 28.6514 26,437.0
3 36.6316 29,291.4
4 35.5433 29,570.8
5 36.7857 30,342.1
6 37.4010 30,701.9
7 38.9991 30,657.4
price,purchasedelectricitypool/spotpriceandpurchasedelectric- ity forward price, respectively. Energy quantities are represented by E,beingE¯R andEP theenergysoldtothefinalconsumersand purchasedinthepoolmarket,respectively.Forwardquantitiesare represented by QF, being dt the time range covered by the for- wardcontract. Restriction(9b)determinesthemaximumquantity Q¯ allowedtobepurchasedintheforwardcontract,andrestriction (9c) representsthe energy equilibrium, wheretotal sold electric- ity must be equal to all the available (purchased) electricity per scenario
ω
andtime t, beingEtPC anyadditional electricity avail- ableforperiodt.Itshouldbenotedthatbyconstraining(9b)short salesintheforwardmarketarerestricted,since0≤ QF.Forsimplicity,onlyonemonthofdatafromtheSpanishpower market is considered. Although further detail about the kNN al- gorithm willbe explainedin the next section, we use it hereto illustrate generalproperties ofour approach.Taking August 2020 hourly data for demand and spot prices from ESIOS (2021), we compare not only the performance of the solution to retailer’s problem (9), buthow it is affected by the selection ofhyperpa- rameterk(numberofneighbours).AsdetailedinAlgorithm1,and then applicable to any of the different ML techniques, the train dataispartitionedintoM differentregions, andwe selectthere- gionatwhichcovariateswouldsituatethepossiblespotprice,tak- ing into account the spot price structure of that day. Here, we shouldhighlightthatthealgorithmmakesuseofamultiple-output structure, andthus, the 24 hours of the dayare mapped into a single region.Therefore, giventhe realizationof the covariates x, inthiscasethe24hourSpanishsystemoveralldemand,thekNN algorithm will identify the region R(xj) withthe k closest days in terms ofsimilar 24 hourdemand profiles. Then each scenario
ω
=1,...,k will be matched with one of these k days, and the correspondingpoolpricesλ
Ptωwillbefixedtotheobservedhourly prices(t=1,...,24) inthoseparticulardays.Then,thestochastic problem(9)willbesolvedbytakingeachscenariothatbelongsto regionR(xj)withweightπ
ω=1k.Ifweareinthevalidationstage, thisprocess willbe repeateduntiltheoptimum k∗is reached,so thatMAEk∗≤ MAEk∀
k∈K.Compared topoint-prediction andSAAmethods, theprescrip- tive approachhas somepeculiarities thatwe willbriefly describe through thefollowingexample.Consideringthe firstweek ofAu- gust 2020, with growing average demand steadily increasing, as easily observed in Table 1, we will take then the 7thdayas the basescenarioforwhichwestill don’thaveinformationaboutthe spotprices.Althoughdemand isstill unknown,thedemand fore- castgivenbyRedEléctricade España(ESIOS, 2021),thecompany in chargeofthe powersystemmaintenance andoperationinthe Spanish System, is quite accurate withlessthan 2% of errorrate overrealdemand.Therefore,wewilluserealdemandasthecur- rentforecasteddemandforday7th.
Features (covariates) that indicate theday, monthandyear of thedataarenotemployed inthisexamplebutwillbeconsidered inthecasestudysection.Thus,onlydemandsperhourareusedas covariatestoexplainspotpricesvariability.Consideringday7thas
Table 2
Spanish market daily average spot prices and power demand, from August 15, 2020 to August 25, 2020.
Spot Prices Real Demand
Day [ € / MWh] [MWh]
15 30.9996 24,535.0
16 29.1421 23,046.5
17 36.4651 26,620.1
18 38.6398 27,877.7
19 34.4472 28,577.1
20 34.6433 28,726.2
21 33.3708 29,061.3
22 32.7462 26,370.0
23 28.6548 24,369.0
24 39.6154 29,008.6
25 39.0749 30,566.7
atestscenario,anddays1stto6thastraindata,wesolveproblem (9)fordifferenthyperparametersk,withonlyoneforwardcontract withauniquepriceof
λ
F =36.8andamaximumavailablecapac- ity ofQ¯=5000. The optimum k is reached underthis approach overk=1andk=2aslogiccouldsuggestusupfront.Sincehourly demand and prices continuously increase (Table 1 summarizes their average values), and the day 7th presents the higher ones overthesample,itisexpectedthattheclosest pointsintermsof demand are the daybefore and, if we increase k, the preceding daysthatmatchthehyperparameterk,sincetheyexhibitthelower distanceswithrespecttoday7th.However,whydovaluesofk≥ 3 andaboveincreasethe errorwithrespecttotheperfectinforma- tionratio?Themainreasonarisesfromthestochasticproblemin- trinsicbehavior. Since the scenario selectedas the3rd closest to day7thhasanaveragedailyspotpricebelowthepriceofthefor- ward,thealgorithmconsidersthatforthispointtheforwardcon- tractedshouldbecloseto0,whichisasub-optimalsolutioncom- paredto lower valuesof k<3.However, in thisscenariosetting, what seems logicalis that thepreferences abouthyperparameter selection between prediction-error andprescription-error are the same,i.e.,{
k∗ML=k∗DD|
MAEMLk∗ML≤ MAEMLk ∧MAEDDk∗DD≤ MAEDDk∀
k∈K}
beingMAEkML andk∗ML the MeanAbsoluteError andoptimalkfor theMLalgorithmfocusedonthespotpricepredictionrespectively, andMAEkDDandk∗DD theMeanAbsoluteErrorandoptimalkforthe prescriptivedata-drivenmethod,respectively.
Nevertheless,what ifwe encountera samplewhere thereex- ist“jumps”,orratherachangingmarketcontextforwhichpower pricespatternsdifferevenundersamedemandandweathercon- ditions,ascanusually beobserved inhistoricaldataseries.Toil- lustratethiscase,wetakespotpricesanddemandfromAugust15, 2020toAugust25,2020,andtestthealgorithmbehaviorusingthe lastdayofthesampleasthetestdata(Table2).
InthissettingwehavedayswithdemandsimilartothatofAu- gust 25,butvery differentprices,e.g., August 21,and August20, andothersthatareextremelyclosebetweenthemintermsofspot pricebutwitha completelydifferentdemandprofile,e.g., August 18. Ifwe solve the problem andcompare the solutions between the prescriptive algorithm and the point prediction approach in terms of hyperparameter selection, assuming three different for- ward prices
λ
F and maximum available capacity, the gap in theprocessofselectingtheoptimalk∗isnowclearer.
Inordertoanalyzetheresults,wewillmakeuseofFig.1,that plots MAEDD (prescriptive error) on the red left axis andMAEML (predictive error) on the blue left axis, for a range of k values.
We also identify the optimal k∗DD and k∗ML, rendering the lowest prescriptiveandpredictiveMAE,respectively.Bydoingso,we can comparethedifferencesbetweenthesetwohyperparameterselec- tionapproaches,andhowtheyareaffectedbytheproblemstruc- ture.Thus,comparingtheresultsshowninFig.1itisevidentthat
Fig. 1. Spot Price Estimation MAE vs Data-Driven MAE, ¯Q = 50 0 0 (a) λF = 33 . 4 . (b) λF = 34 . 8 . (c) λF = 36 .
theoptimalk∗variesdependingontheforwardpriceandadopted approach. Thereby, if we observe the three different graphs, the prescriptive approach learnsfaster about theseparticular market situation, mainly because it is still solved as an stochastic prob- lem,andassuch,takesintoconsiderationworstcasemarketcon- ditions, acquiringenergy in the forward market to reduce profit uncertainty. Taking the first case in Fig.1a, where forward price isthelowestandequalto
λ
F=33.4€ /MWh,theclosestscenario isalwaysAugust21,whoseaveragespotpriceis33.3708€ /MWh, value that is far from thetest average priceof 39.0749 € /MWh andnotagoodestimatorofthefuturesprice,asindicatedbythe bluelinegraph.Sincethecurrentforwardpriceisabove33.3708€ /MWh theobvioussolutionatthefirststage ofthe problemisto not acquire anythingfrom theforward contract to maximize the profit, which,however,impliesthelowestprofitpossiblewithre- specttotheperfectsolution,whichrendersthebiggestMAEML.The MAEML slowlydecreases,so it doesnot reach the selection ofan optimalk∗ untilit convergestok=5,wherebythe optimaldeci- sionisjustreachedwhentherearetwoscenariosinthestochastic problem. Thestochasticsolutionisobvious,sincethedistancebe- tweenλ
F andλ
Ptω is much lower forday 21 than between days 21 and25.Thehere-and-nowsolution willbemaximumby buy- ing forwardinthefirststage,since bothscenarios havethe same weightwi=12.Leveragingontheinformationprovidedbytheco- variates X allowsthealgorithmtoweighteach scenariobasedon theauxiliary information,improvingtheretailerdecisionwithre- specttotheoneprovidedbySAA.InFig.1,thecasewithsamplesizeN=10couldbeconsidered as theSAA solution,since thisis thenumber ofscenarios inthe train. Thedecisiontakenbytheretaileraccordingtoit isnotad- equateasequalweightisgiventoscenariosthatdonotrepresent thecurrentscenarioconditions.Thereexistmultipletechniquesto correct and assign different probabilities (weights) to these sce-
nariosbasedonempiricaldistributionapproximations,butallstill give some weight to datapoints that are far frompotential sce- nariorealizations,contributingtoincrease thebias oftheretailer decision. We alsocan observethat the algorithm speedof learn- ingisfasterthanpointpredictionsintermsofhyperparameterse- lection since the prescriptive algorithm loss function during the validationprocessisfocused intheoptimizationproblemsolution errorand not in the target variable estimation bias,which leads alsotobetterhyperparameterselection(Fig.1a,b).Inthefollowing, we will furtherstudy ifthisbehavior remains withgreater sam- plesizesandnumberoffeatures, includinguncertainprice-quota curves,inamorerealisticandcomplexproblemsetting.
3. Prescriptiveprocedureappliedtothepowerretailerproblem 3.1. Power retailer problem description
In this section, we extend the simplified version of problem (9) to incorporate more realistic features. In particular, the new modelisbasedintheformulationpresentedinChapter8inConejo etal. (2010),where a electricity retaileraims to maximize profit by participatinginthe electricity market,withno capacityto af- fectday-ahead marketprices(price-taker),butableto impactthe futurescontractsprices(price-maker).
Traditionalstochasticapproachesforthisproblemmakeuseof the CVaR(Conditional Value atRisk), as a wayto introduce risk aversion inthe decisionmaking process. In thiswork we do not considerthe CVaRdueto thefact that, inthevalidation process, different valuesof k lead to different sample shapes(and sizes), andhencedifferentempiricaldistributionsoftheuncertainparam- eters.Therefore,thetailsofthosedistributions(andtheirexpecta- tion)are notcomparable.Furthermore,consideringthattheprob- lemobjectiveistomaximizeexpectedprofit,certaindegreeofrisk
controlling isalready presentin ourapproach sincegivenx, data realizations (scenarios) far from E[Y
|
X=x] have weight 0 in the solutionprocess. Aswe willobserve,thenumberofscenarioswe accountfor(ordiscard)intheoptimizationproblem,isdirectlyre- latedtothevalueoftheMLhyperparameter.The formulationoftheretaileroptimizationproblemisasfol- lows:
maximize
QFf j,λRei,vei,EtPω N
ω=1
π
ω NT
t=1
N
E
e=1 NI
i=1
λ
ReiE¯etiRω−λ
PtωEtPω−f∈Ft NJ
j=1
λ
Ff jQFf jdt(10a) s.t. 0≤ QFf j≤ ¯Qf j,
∀
f,∀
j (10b)λ
¯Rei−1v
ei≤λ
Rei≤ ¯λ
Reiv
ei,∀
e,∀
i (10c)NI
i=1
v
ei=1,∀
e (10d)NE
e=1 NI
i=1
E¯etiRω
v
ei=EtPω+f∈Ft
QFfdt+EtPC,
∀
t,∀ ω
(10e)NJ
j=1
QFf j=QFf,
∀
f (10f)v
ei∈{
0,1}
,∀
e,∀
i (10g)Theobjectivefunctionisfocused onexpectedprofitmaximiza- tion,consideringasequenceofscenarios
ω
∈andperiodst∈T. The retailer also accountsfor different typesof clients e∈E and price-quota blocksi∈I.In thisproblem, the uncertaintyemerges when theretailerhastodecide howmuchofevery forwardcon- tract QFf must be signed at time t0 and delivered at time t= 1,...NT, ifspot pricesλ
Ptω are not known in advance. Therefore, this is a two-stage stochastic problem (Birge & Louveaux, 2011).The first restriction (10b), builds each forward curve Ft available at time t, as an increasing piecewise-linear function, where the total purchased energy foreach forwardcontract is givenby re- striction (10f). Eq. (10e) represents the energy balance between the available energy (right handside),and thecompromised en- ergy to be delivered by the retailer (left hand side). Constraints (10c), (10d) and(10g), definethe price-quota curve as a decreas- ing piecewise-linear function. The price-quotacurve is our main sourceofuncertainty,andisonlydeterminedoncepoolpricesare revealed. A more detailed representation and description of the problemcouldbefoundinConejoetal.(2010),asmentionedear- lierinthissection.
In this work, the covariates used to estimate day-ahead spot pricesaredemandandthepointintime,consideringeachhourly demand(24variables),day(7dichotomicvariables),month(12di- chotomicvariables)andyear(onedichotomicvariableper yearin the train and/or thevalidation) differentiated features, forwhich those prices are estimated. Thus, the estimation itself can accu- rately differentiate between peak and base hours, together with other daily patternsthat couldindicate achange inthedemand- pricerelationship (Karakatsani& Bunn,2008). Alternativeautore- gressive models (Conejo, Contreras, Espínola, & Plazas, 2005) or factorsmodels(Liebletal.,2013)proposemanyotherpotentialco- variates tobe considered.However, wedidnot includeanyother auxiliary informationsincedemandalreadyimplicitlyincludesin- formation such asweather andsocio-economic factors ourtarget
is to analyzethe behavior of the algorithm itself, andnot to in- creasetheelectricitypricesforecastingaccuracytothepointofin- curringinundesirable overfitting.Additionally,muchoftheinfor- mationthatcould beusedisnormallyproprietaryofeach market participant,suchasthestatusofeachoftheelectricitygeneration units.
As previously disclosed in Algorithm 1, the problem will be solved in two steps. First, we solve the ML algorithm, indicating the hyperparameter k to be used along the ML algorithm solu- tionprocess,andthenweaddressthestochastictwo-stageretailer problem. The mainquestionthat arisesis howto selectthe best k.Traditionalapproacheswouldusecross-validation(Ripley,2007;
Stone, 1977), overcoming theproblem of not having a fixed rule when the sample size is from medium to small size. Otherwise, theruleassignedbyCover&Hart(1967)couldbeused,aslongas n→∞.However, littleattentionis usually payedto therelation- shipbetweentheMLset-upandtheadequacyofthishyperparam- eter selection to the prescription. Furthermore,since our sample consistsof6yearsofhourlyspotpricesandpowerdemanddata, withtime-dependentcovariates, we make use ofcross-validation approaches focused on time-series hyperparameters selection, as theoneproposed inMakridakis (1990)andreferredtoas“sliding simulation”.
Themotivationtoemploy thistechniqueisthat first,wewant to avoid high bias of the estimates and second, we want to avoid inconsistency forthe hyperparameter optimization method selected.Regarding thislast issue, traditionalcross-validationap- proaches cannot be applied directly to this method, as there is a certain time dependency and seasonal effect in demand and powerdata,asshowninFig.2.Intraditionalsettings,k-foldcross- validationselectsdata foldsaleatory,not preservingthe temporal pattern,andthusexplainingpartofpastdatabehaviorwithafu- turesampleformanyofthefoldsselected,makingthevalidation methodtheoreticallyandempiricallyinconsistentTashman(2000). Thesliding windowapproach proposed hereevaluates theac- curacyoftheprescriptionsusingonesixthofthesample,i.e.,two monthsperyearofsample, takingtheleadtime,orascommonly known forecasting horizon, of size equal to 2 periods, and thus splitting thevalidation sample infolds of size two, that increas- ingly incorporates tothe train sample. A completedescription of thisapproach can be found inTashman (2000).The error gener- atedalong each oneofthe leadtimesused duringthevalidation stepisthenincorporatedtocomputeboththeMAEoftheestima- tion and the MAE of the prescription, although only the later is usedduringthevalidationphase.
Aspreviouslyindicated,insteadofdirectlyapplyingthe“coeffi- cientofprescriptivenes” Pasthelossfunction,wemakeuseofthe
“Mean Absolute Error” (MAE) betweenthe perfect-foresight solu- tion and the solution obtained through the prescriptive method, i.e., the prescription error. This allows us to compare the solu- tions and hyperparameter composition between the determinis- tic approach, what is normally called point-prediction, and the data-driven prescription, in a more directly manner. Besides, we avoid some undesirable result when comparing different sample sizeresults,sincerootmeansquarederror(RMSE)andsimilarap- proaches tend to have higher upper limits with increasing sam- plesizes.Wethennotonlycomparethefinalresultsobtained,but also thedifferences in k betweenapplying slidingsimulation di- rectly ontheML algorithm lossfunction andtheprescriptioner- roritself.Bydoingso,we wanttoexplore,inarealisticsetting,if thereis anysignificant difference intheproblemparameters set- up whetherwefocus ourattentioninthe prescriptionorpredic- tionerrors.
Inthe followingwe describehow thedifferentML techniques considered in this work are adapted to be used within the pre- scriptiveAlgorithm1.
Fig. 2. Spanish market hourly power spot prices and electricity demand for the time period 2014 to 2020.
3.2. kNN algorithm approach
Thek-NearestNeighborsisadistance-basednon-parametricap- proachthatreliesonfindingthe“closest” pointsorgroupofpoints in a certain set, withrespect to a givenpoint. Its simplicityand consistency havemadethisalgorithmone ofthemostcommonly used in different ML fields such as clustering (Henley & Hand, 1996; Sibson, 1973; Wong &Lane,1983) and supervisedlearning (Cover&Hart,1967;Weinberger&Saul,2009).
What we will useinthe currentsettingis thekNNbasic ver- sion, in which no adjustment of the importance of each of the scenarios used is applied,as longas they are partof thek clos- est tothetarget point.Inthesecond stage, wewillbe weighting every scenarioaccordingtoEq.(11),whered(·,·)istheeuclidean distance. Therefore, the function used to determine the scenario weightisthefollowing:
wkNNN,i
(
x)
=1kI[xi∈R
(
x)
:d(
x,xi)
≤ d(
x,xj) ∀
i=j xj∈/R(
x)
] (11)3.3. Trees algorithm approach
Anothertype ofprominentalgorithmsinML historyarethose based intheconstruction ofmappings ofa trainingset intosub- groups of data, for classification or regression purposes. There is a plethora of techniques based on this approach such as ID3 (Quinlan,1986), CART(Breiman,Friedman, Stone,& Olshen,1984) or C4.5 (Quinlan, 1992) among many others, each one with its advantagesanddisadvantages. Themostrecentadvancements for thesetype oftechniquesare motivatedby theimportantspeedup ofthealgorithmsforsolvingmixed-integeroptimizationproblems.
Forinstance,inBertsimas &Dunn(2017),instead ofadoptingthe traditionalheuristictop-downapproachforsplits,itisproposedan exactMIOformulationtoderivetheoptimaldecisiontreeforboth axes-alignedandmultivariatehyperplanessplits.
In this work we focus on the standard CART algorithm, due to themaintenance ofdesirable asymptoticoptimality properties, since ID3 doesnot handle numeric valuesand C4.5 could create empty final leaves,which may lead to some inconsistency orno empiricaloptimalityguarantees.Theweightsassignedtoeachone
ofthescenariosofthetrainingsampleN followsthemapobtained byapplyingCARTtothattrainingsample,andbygivingeachpoint not belonging to x’s region a weight of wN,i(x)=0, accordingto (12).
wCARTN,i
(
x)
= I(
R(
x)
=R(
xi))
|
j:R(
xj)
=R(
x) |
(12)where R is a partitioning of the sample N into M subsets with M
r=1R−1(r)=∅. Since CART presenta large numberof hyperpa- rameterstobeadjusted,forclaritywewillfocusonlyonthe“tree depth”.
Thismethodologycanalso beadapted totree-basedensemble methods,i.e.,RandomForest,whereweights(12)canbecomputed by combining several trees (Bertsimas & Kallus, 2020). However, wehavenot includedthesemethods inournumericalcasestudy (Section4)duetoanobservedsignificantworstperformancethan CART. This can be explained by the relatively small size of the datasetconsidered,ifcomparedwithotherlarge-scaleapplications whereensemblesandrandomizationexhibitahigheraccuracy.
3.4. K-means algorithm approach
Apartfromsupervisedapproacheswhosetheoreticalproperties andjustificationarewell established (Bertsimas& Kallus(2020)), wealsoproposeanalternativemethodbasedonunsupervisedal- gorithms.Inparticular,weconsideroneofthemoststudiedalgo- rithmsindatascience: K-means.Developedby manyauthors,the algorithmproposed byLloyd(1982)canbeconsidered oneofthe mostprominentones.Typicallyusedtoidentifypatternsorgroups segmentation, the algorithm provides divisions of a data sample into subgroups that share similar characteristics. With the unsu- pervisedalgorithm,asinprevious MLapproaches, weseektose- lectthosescenariosfromthedatasamplethatsharesomecharac- teristics,leveragingontheinformationprovidedbythecovariates X.Themain differencecomesfromthewaytheregions’segmen- tationandscenarioweightsareobtained.Sincenofeedbackispro- videdbythedependentvariable(unsupervised),thereisnorealfit oftheresponsevariable(spotprices).However,thedata-drivenop- timizationalgorithmeffectivelyprovideinformationtotheregions inthesamewayaspreviousMLtechniquesdo,i.e.,leveragingon theMAE.Centroidsareusedtoprovideacomparisonbetweenthe