Prescriptive selection of machine learning hyperparameters with applications in power markets: Retailer"s optimal trading

(1)

ContentslistsavailableatScienceDirect

European Journal of Operational Research

journalhomepage:www.elsevier.com/locate/ejor

Analytics, Computational Intelligence and Information Management

Prescriptive selection of machine learning hyperparameters with applications in power markets: Retailer’s optimal trading

Alberto Corredera, Carlos Ruiz

^∗

Department of Statistics & UC3M-BS Institute for Financial Big Data (IFiBiD), University Carlos III de Madrid, Avda. de la Universidad, 30, Leganés 28911, Spain

a rt i c l e i nf o

Article history:

Received 22 November 2021 Accepted 9 November 2022 Available online 15 November 2022 Keywords:

OR in energy Data-driven Electricity retailer Hyperparameter selection Machine learning

a b s t r a c t

Wepresentadata-drivenframeworkforoptimal scenarioselectioninstochasticoptimizationwithap- plicationsinpowermarkets.Theproposedmethodologyreliesontheexistenceofauxiliaryinformation andtheuseofmachinelearningtechniquestonarrowthesetofpossiblerealizations(scenarios)ofthe variables ofinterest. In particular, weimplement anovel validation algorithmthat allowsoptimizing eachmachinelearninghyperparametertofurtherimprovetheprescriptivepoweroftheresultingsetof scenarios.Supervisedmachinelearningtechniquesareexamined,includingkNNanddecisiontrees,and thevalidationprocessisadaptedtoworkwithtime-dependentdatasets.Moreover,weextendthepro- posedmethodologytoworkwithunsupervisedtechniqueswithpromisingresults.Wetesttheproposed methodologyinarealisticpowermarketapplication:optimaltradingstrategyinforwardandspotmar- ketsforanelectricityretailerunderuncertainspotprices.Theresultsindicatethattheretailercangreatly beneﬁtfromtheproposeddata-drivenmethodologyandimproveitsmarketperformance.Moreover,we performanextensivesetofnumericalsimulationstoanalyzeunderwhichconditionsthebestmachine learninghyperparameters,intermsofprescriptiveperformance,differfromthosethatprovidethebest predictiveaccuracy.

ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/)

1. Introduction

Weareenteringafourthtechnologicalrevolutioncharacterized by theautomationanddigitizationofindustrialprocesses,andby a moreefficientandsustainableallocation ofresources.Thereare varioustechnologiesthatdrivethisdevelopment(e.g.theinternet ofthings,cyber-physicalsystems,smartsensors,cloudcomputing, etc.) and that enable the generation, the efficient collection and processing, and the analysis of large volumes of different types of data (Big Data). In this context, the field of decision-making under uncertaintyhasthe opportunityto leveragefromthisdata availability to face important challenges (e.g. pandemic management,productionallocation,investmentinrenewabletechnologies, personalizedmedicaltreatments, demandresponse inpowersys- tems,etc.).However,traditionaldecision-makingtechniques,based onstochasticorrobustoptimizationproblems,arenotdesignedto take advantageofthe fullpotential thatthesenewdatasetsoffer.

Therefore,itisnecessarytoimproveandadaptthesetechniquesto

∗Corresponding author.

E-mail address: [email protected] (C. Ruiz) .

beneﬁtfromaricherempiricalcharacterizationoftheuncertainty associatedwiththemodel.

Traditionally, deterministic optimization techniques have been used to tackle complex decision-making problems in different ﬁelds of application (Murty, 1994), e.g. allocation of schedules, management ofproductionsystems,organization ofairlines,pric- ingsystems,etc.Thesetechniquesencompasslinear,nonlinear,in- teger optimization, ora combinationof these, andare based on a fundamental hypothesis: the input parameters of the problem are known with complete certainty. However, this hypothesis is rarely fulﬁlledin real contexts. Tosolve this problem, newtech- niqueshavebeendevelopedthatincorporatetheuncertaintyasso- ciated withtheproblem parameters.One ofthe mostusedtech- niques is stochastic programming, where the model incorporates theestimatedprobabilitydistributionoftheuncertainparameters, eitheranalyticallyorthroughascenario-type discretization(Birge

&Louveaux,2011).Itisaveryversatiletechniqueformodelingre- alistic problems; however, it has some drawbacks such as: (i) A highsensitivityofthesolutionto thechosenprobability distribution(Römisch,2003),(ii)The computationalcomplexityincreases exponentiallywiththe sizeofthemodels (Shapiro &Nemirovski,

https://doi.org/10.1016/j.ejor.2022.11.011

(2)

2005)and(iii)Itdoesnottypicallytakeintoaccounttheinforma- tionprovidedbyauxiliaryvariables(Bertsimas&Kallus,2020).

Speciﬁcally,stochasticprogrammingconsiderstheproblem:

minz∈Z E[c

(

^z;Y

)

^] ⁽¹⁾

where z∈Z⊂ R^d^z are thedecision variables,Y∈Y⊂ R^d^y are the parameters that characterize the problem, c(^z;Y)^:R^d^z× R^d^y→R is the cost function andE[·] represents the expected value over the Y distribution.In general, the probability distribution ofY is unknown (Nemirovski&Shapiro,2007),althoughhistoricalobser- vations of this variable

Y¹,Y²,...,Y^N

, are available, and thus its empirical distribution can be reconstructed. For this reason, theSampleAverageApproximation(SAA)approachisusuallyused withthefollowingformulation(Shapiro&Nemirovski,2005):

minz∈Z

1 N

N

i=1

c

z;Yⁱ

. (2)

In thisformulation, thetheoretical expectedvalue isreplaced by the meancalculatedontheempirical distributionwhereeachre- alization(scenario)oftheparameterYⁱisassignedaprobabilityof

1 N.

However,inpracticeitispossibletohavehistoricalseriesofthe parameters ofinterest(Y),togetherwithauxiliaryparameters(X), i.e., covariates, that can help to improve their probabilistic char- acterization.Forthisreason, forsomeyearsnow, ithasbeenpro- posedtoaddresstheoptimizationproblemfromadata-drivenper- spective,where ideasof statisticsandMachine Learning(ML)are combined withmathematicaloptimization(Keith& Ahner,2021).

Hence,wemayconsidernowthissetting:

z

(

^x

)

^∈^argmin

z∈Z E[c

(

z;Y

|

^X⁼^x

)

^]

where optimal decisions z(^x) ^depend ^on ^auxiliary information x, which is assumed to be known at the time of decision making, and whichcan havea highimpact onthe uncertaintyassociated withY.

Mostpracticalimplementationsofthisproblemareapproached by two disjoint stages: (i) Predict: use available databases (

(^x¹,y¹),...,(^x^N,y^N)

)totrainpredictivemodelsy= f(^x)^,^based onMLortraditionaltimeseriestechniques(James,Witten,Hastie,

& Tibshirani, 2013), forthe parameters that the decision-making model needs and (ii) Optimize: Use the predictive input to obtain optimal solutions. However, it has been observed how this approach maybe sub-optimal,asit doesnot adequatelyquantify how the uncertainty of the predictions can impact the objective functionofthedecision-makingproblem.

Motivatedbythisfact,aseriesofresearchworkshaverecently addressed these diﬃculties in order to eﬃciently integrate prediction andprescriptioninthecontext ofstochastic programming (Mundru,2019).Theproblemtobe treatedcanbegeneralizedas:

z

(

^f,^x

)

^∈^argmin

z∈Z E[c

(

z;Y

|

^{f, X}⁼^x

)

^] ⁽³⁾

wherethedecisionvariables(z)andtheestimatedprobabilitydis- tributionofY dependbothontheauxiliaryinformationx,andon thechosenpredictivefunction f(^x)^.^Now,^the^choice^of ^f(^x)^can^be assessed from the improvement ofdecision-making process, and notonlybyminimizingapredictionerror.

Various authors have addressed different versions of problem (3) or its surrogate via empirical SAA. In particular we can differentiate thoseapproacheswherepredictionandprescriptionare developed in two differentiated stages, and those with an integrated perspective. Regarding the former, Tulabandhula & Rudin (2013) proposea two-stepprocedure toselectapredictivemodel with good prescriptive performance. First, the predictive model

is selected by including a regularization term in the loss func- tionof thelearningproblem, accountingfortheoperational cost.

Second,the optimalpolicies(minimum operationalcosts)arede- rived by usingthe previous predictive model.Bertsimas & Kallus (2020) propose ﬁrst to use supervised non-parametric ML tools (kNN,RandomForest, etc.)toselectpotential realizations(scenarios)oftheresponsevariables,givenimperfectobservations(auxil- iarydata).Then, aconditionalstochasticoptimizationproblemsis usedtoderivethebestprescriptivepolicy.

Considering integrated approaches, Donti, Amos, & Kolter (2017) propose to find the optimal parameters of the model parameters based ona prescriptive-based loss function. Due to the potentialnon-convexnatureofthisfunction,aniterativestochastic gradientdescent approachis proposedto findlocalsolutions.For problems where uncertainty is present in the objective function parameters (linear),Elmachtoub& Grigas(2021)propose aninte- grated predictive-prescriptiveframework (SmartPredictthen Opti- mize) wherethe loss functionused to train thepredictive model explicitly accounts for the prescription error. To overcome com- putationalchallengesduetonon-convexities,atractablealgorithm isproposedbasedon arelatedconvexlossfunction.Ban& Rudin (2019)proposeintegratedalgorithmsbasedonempiricalriskmin- imization (ERM) andkernel-weights optimization (KO), anddeci- sion rules that directly link, via a predefined functional model, the covariates with the decision variables. These algorithms are testedinthenewsvendorproblem.Inthesameapplication,Huber, Müller, Fleischmann, & Stuckenschmidt (2019) evaluate numeri- cally,andunderdifferentMLandtimeseriestechniques,whenin- tegratingpredictionandprescriptionoutperformstraditionalfirst- predict-then-optimize approaches. Mundru (2019) consider decision models with auxiliary covariate data where the ML model is trained to improve the prescriptive performance while penal- izing the uncertainty associated with the predictions. Gupta &

Rusmevichientong (2021) introduce an approach focused on lin- earoptimizationproblemswhosedataavailabilityissmallbutyet adequate to describe the uncertainty. Muñoz, Pineda, & Morales (2022) introduce a bilevel approach by obtaining a parametric model, based on decision rules, that integrates the ML problem characterizationintothetargetoptimizationproblem.

Moreover, due to their interpretability and scalability properties, some recent works have focused in developing related integrated frameworks for tree-based algorithms. In particular, Bertsimas, Dunn, & Mundru (2019) generalize previous works on tree-based algorithms so that training is performed with a loss function that balances both the predictive and prescriptive performance.The methodology isadapted to generateboth constant (mean outcomes in each leaf) and linear (elastic net model in each leaf) predictions. In the same vein, Stratigakos, Camal, Mi- chiorri,&Kariniotakis(2022)proposeanothertree-basedmethod- ology which focuses on learning a policy conditioned on covariate data.The aim is to use this policy to take optimal decisions basedonaweightedSAAframework,similartoBertsimas&Kallus (2020).

Ourworkcanbe viewedasabridgebetweenthesetwotypes of approaches.In particular, we address the conditional stochas- ticoptimization problem(3) witha ML-based scenario selection, which is basedon the work of Bertsimas & Kallus(2020).How- ever, given a ML predictive function f , we acknowledge its de- pendence withrespecttosome hyperparameters (e.g.,numberof neighborsinkNN,depthofadecisiontree,numberofcentroidsin K-means,etc.) thatneed to be ﬁxedbeforehand. Theseare tradi- tionallytunedup basedonpredictive performanceandvalidation techniques(Jamesetal.,2013).However,inthiswork wepropose to extendthis setting froma prescriptive point of view in order toselect thebest learningmodelover avalidation set.The main contributionsofthisworkareﬁvefold:

(3)

(i) Topropose aprescriptive-basedvalidationschemeto select optimal ML hyperparameters for scenario-weighted condi- tionalstochasticproblems.

(ii) Tostudyhowthisschemecanresultinsubstantiallydiffer- ent hyperparameters’values withabetter prescriptive performance,ifcomparedtothetraditionalpredictiveapproach.

(iii)Toillustrate,throughanextensivenumericalanalysis,which are themainfactors driventhesedifferences:samplesizes, marketconditions,performancemetricsandMLtechniques.

(iv)To extend this validation and approach with unsupervised MLtechniquesandwithtimeseriesdatasets.

(v) Totesttheproposedframework inarealworlddata-driven probleminthecontextofelectricitymarkets.

Asindicated,wetesttheperformanceoftheproposedmethod- ology in a real-world applicationbased on the medium-termre- tailer problem deﬁned by Conejo, Carrión, & Morales (2010). In particular, weanalyzethe problemfaced byan electricityretailer that seeksto derive its optimal procurementstrategy via futures andspotmarkets,together withtheappropriate selectionofare- tailpricetariff forits clients(consumers).Theretailerhaveaccess toseveralyearsofhistoricalrecordsofhourlydata,includingspot prices anddemand loads. Some of these variables have a direct impact onits decisionproblem(i.e.,Y),whileotherscan beused as auxiliary information(i.e., X)which is knownat thetime the decision making takesplace.We will evaluate differentML techniques underdifferentmarketconditionstoexplorethemainfac- torsdriventheretailersproﬁt.

2. Prescriptivealgorithmdescription

Recent data-driven methods leverage the solution process on thedataitselftoaccountforanadequatealgorithmsetupandﬁnd an optimal solution of the stochastic problem (3). They also try toovercomethelimitationsoftraditionalapproachessuchasSAA andpointpredictionmethodsincommonoperations/management settings.The formerdoesnot guarantee,underﬁnitesamplecon- ditions, anadequate asymptoticperformance andtractability,and the latterhas poorperformance when the samplesize increases, reaching sub-optimal decisions. In particular, Bertsimas & Kallus (2020) propose to estimate the conditional stochastic problem (3) by a SAA-based formulation, where the weights assigned to eachsamplearederivedfromapredictiveMLmethod:

ˆ

z_N

(

^x

)

^∈^argmin

z∈Z

N

i=1

w_N_,i

(

^x

)

^c

(

^z,^yⁱ

)

^, ⁽⁴⁾

wherezˆ_N corresponds totheoptimaldecisions tobemadebased on theinformation available ataspeciﬁc point intime, i.e., SN=

{

(^x¹,y¹),...,(^x^N,y^N)

}

^, ^given ^that ônly â ^subset ôf ^the ^covari-

ates x=

{

^x¹,...,x^N

} |

^X∈X⊂ R^d^x and target uncertain variables y=

{

^y¹,...,y^N

} |

^Y∈Y⊂ R^d^y are available at scenario i. Scenario weightsw_N,iarederivedfromtheMLalgorithmandusedincom- binationwiththecostfunctionc(^z,yⁱ)^to^approach^the^optimal^decisionz^∗.Notethat inthiscase, z

|

^Z∈Z⊂ R^d^z correspondstothe decisionvariablesthatareconditionedtosomeinformationonx.

OurmaincontributionistoaddressthecriticalMLissueofse- lecting theappropriate hyperparameterlayoutfromaprescriptive pointofview,ratherthantraditionalpredictiveapproaches.More- over,weseektotesttheusefulnessoftheprescriptivemethodde- scribedaboveinarealworldsetting.Inparticular,weconsiderthe problem:

ˆ

zN

(

^x; k

)

∈argmin

z∈Z

N

i=1

w_N,i

(

^x; k

)

^c

(

z,yⁱ

)

, (5)

whereweexplicitlyaccountfortheimpactoftheMLhyperparam- eters k∈K in the optimaldecisionzˆ_N.We seektoﬁnd thevalue

ofkthatrendersthebestprescriptiveperformanceonavalidation set,differentfromtheset usedtotraintheML modelthat deter- minestheweightsw_N,i.However,formanyrelevantMLtechniques, thefunctionalrelationshipbetweentheseweightsandkishighly nonlinearandnonconvex,sothistaskcannotbeaddressedanalyt- ically. Hence, we propose Algorithm1, asa newproblemvalida- tion framework.Moreover, the proposed methodology isalso applicable to data that presents time dependent patterns, andthat requireand special treatment, compared to traditional validation approaches.

To illustrate the proposed methodology, lets assume that we workwithkNN(k-NearestNeighbor)astheMLtechniquetoselect meaningfulscenariosbasedoncontextualinformation.Weseekto obtaintheappropriatevalueofk,i.e., numberofneighbors,based onavalidationprocedure.Hence,weproposetheimplementation ofthefollowingdata-drivenalgorithm:

Algorithm1:DataDrivenOptimizationAlgorithm.

Input:X,Y,K Output:zˆ^∗_N,k^∗

1 fork∈Kdo

2 Fit ML f orS_N=

{

^X,Y

}

^and ^obtain ^regionsR(^x; k)

3 for j=1 to N_vdo

4 Get R(^x; k) ^{f or}

{

^x^j∈S_N_vwhereR(^x; k)=R(^x^j; k)

}

^;

w_N_,i(^x; k)=1

kI[xⁱisinregionR(^x; k)^]

5 Sol

v

ezˆ_N(^k)∈argmin

z∈Z

N i=1

w_N_,i(^x; k)^c(^z,yⁱ)

6 end~for

7 MAE^k= N¹v Nv

j=1

minz∈Z(^c(^z,y^j))− c(^z^ˆN(^x^j; k),y^j)

;

8 end~for

9 Pickk^∗ ⇒MAE^k^∗ ≤ MAE^k

∀

^k∈K

10 Pickzˆ^∗_N⇒k=k^∗

11 returnzˆ^∗_N,k^∗

where S_N_v=

(^x¹,y¹),...,(^x^N^v,y^N^v)

is a validation set so that S_N∩S_N_v=∅, K is the hyperparameter k’s domain under explo- ration,andRamapproposedbytheMLsothatX=∪^Mm=1R(^m)⁻¹^. The proposed Algorithm 1 follows 2 steps. First, we pick a hyperparameter value k∈K, and generate a map R(^x; k) ôf ^the trainingSN by fitting theML algorithm. Then we assign weights w_N_,i(^x; k)^,^toêvery ^pointôf^the^regionât^which^the^MLâlgorithm assignsthetargetpoint(^xⁱ)ôf^the^validation^set^SNv,andsolvethe problemaccording to (4). In the second step, the MAE for every k∈K is calculated andevaluated toselect the optimalk^∗ asthe one renderingthe lowest MAE, withtheassociated optimaldeci- sion zˆ^∗_N. Notethat the domain k∈K canbe explored by a grid- searchtechnique.

Regardingperformancemetrics,Bertsimas&Kallus(2020)propose to focus on the ﬁnal output of optimization itself, using a loss function of the optimization problem that they denote “Co- eﬃcientofPrescriptiveness” (P).Inparticular,thecostoftheper- fectforecast solutionisusedasa referenceto determinethedis- tancetotheperfectinformationsolutionoftheproposedprescrip- tivemethod,anditiscomparedtothedistanceoftheSAAsolution costintheformofaratio(6).

P=1−

(

^R^ˆN_v

(

^z^ˆN

)

− ˆR^∗_N

v

)

/

(

^R^ˆN_v

(

^zN^SAA

)

− ˆR^∗_N

v

)

⁽⁶⁾

where Rˆ_N_v(^z^ˆN) îs ^the êxpected ^cost ûnder ^the prescription algo- rithmapproach,Rˆ_N_v(^z^SAAN )îs^theêstimatedêxpected^costûsing^SAA andRˆ^∗_N

v istheperfect-foresightexpectedcost.Itshouldbenoticed thatz^SAA_N iscomputedfollowingtheoriginaldeﬁnitionoftheprob-

(4)

lemandsolved similarlyto(2)fora givensampleSˆ_N ofsizeN:

ˆ

z^SAA_N ∈argmin

z∈Z

1 N

N

i=1

c

(

z; yⁱ

)

⁽⁷⁾

To interpret this measure we have ﬁrst to consider that is boundedabove by 1.ValuesofP closeto1can beinterpreted as an increaseinqualityofthesolutionwithrespecttothestandard SAAapproach.Thisindicatesthatthescenarioweightsprovidedby thealgorithm,i.e.,theinformationtransferencefromtheMLalgo- rithm to solve theoptimization problem, improveswith thepre- scriptive approach. Low values of this measure, should be inter- pretedaspoorinformationtransference,withlim_N_→∞P=0.

Asindicated,weextendtheBertsimas&Kallus(2020)approach byforcingtheMLhyperparameterstobeselectedaccordingtothe prescriptiveperformance,ratherthanthepredictiveone.Then,we compare the errors obtained by both validation processes. From Algorithm1,wecanobservethatthelossfunctionweproposeas an alternative to (6)is the MAE (Mean Absolute Error), whichis deﬁnedby(8):

MAE= 1 N_v

N_v

j=1

^min_z∈Z

(

^c

(

^z,^y^j

))

^{− c}

(

^z^ˆN

(

^x^j

)

^,^y^j

)

⁽⁸⁾

The ﬁrst term within the summation is the cost of the perfect- foresightinformationproblem,wherebythesecondtermisthees- timatedcostoftheproposedprescription.Thus,weseektousean absolute measure of the prescriptive error ratherthan a relative measurewithrespecttotheSAAcost.

The useof MAEseeks todirectly comparethe performance of the proposed algorithmwithrespect toa deterministicapproach, as we will further explain in the analysis of the algorithm set up. This is an important feature, since during the hyperparameter selection process,andunderatraditionalvalidationapproach, thedeterministicpredictivesolutionerrordeterminesthebestpa- rameterlayout.Thus,webelievethatconventionalerrormeasures in theML ﬁeld are alsoappropriate to make theabove compari- son. Furthermore,since MAEwasalso employed tofeedthe validation processover theestimationstep,wealsoincorporatedthe same metric tocompute the prescriptionerror, avoidingcompar- isons withloss functionsthat holddifferentproperties.Neverthe- less, forcomparison purposes we havealso considered the metric P toasses theresults inthenumericalcasestudy(Section4).

In particular, we have observed that while both P and MAE pro- vide similar results intermsof optimalhyperparameterselection (see TableinAppendixA),thecomputationoftheformerimplies amuchhighercomputationalburden.

Toensurethevalidityofthisprescriptiveprocedure,someprop- ertiesmustbefulﬁlledby boththeoptimizationproblemandthe machine learningalgorithm. Regarding the optimizationproblem, the perfectinformationsolutionmust existandbe asymptotically optimal. Thereby, Bertsimas & Kallus (2020) gave three basic as- sumptionsfortheprescriptiveprocedure:

1. Existence. E[

|

^c(^z; y)

|

^]<∞ for every z∈Z, and given that Z^∗(^x)=∅foralmosteveryx.

2. Continuity.Foranyz∈Z and

>0thereexist

δ

>0suchthat

|

^c(^z; y)− c

z; y

|

≤

^for^all^z^with

^z− z

^and^y∈Y.

3. Regularity.Z isclosed,nonemptyandeither:

• Z isboundedor

• lim inf_z_→∞inf_y_∈Yc(^z; y)>−∞ and for every x∈X, there existsDx⊂ Y suchthatlim_z_→∞(^z; y)→∞uniformlyover y∈DxandP(^y∈Dx

|

^X=x)>0.

Theoretical proof of these assumptions is only given by Bertsimas & Kallus (2020) for the kNN approach, although justi- ﬁcationforothersupervisedlearningmethodsnotusedhere,such

as Kernel Methods or Local Linear Methods, is also provided. In addition to the assumptions previously stated, one ofthe condi- tionsassumedtofulﬁlltheserequirementsisthattheoptimization problemhastobeconvex.

Besidesthesethreeassumptionsrelatedtotheasymptoticopti- malityproperty,twootherissuesmustbeconsidered.Theﬁrstone isthefundamentalproblemofcausalinference,sincethedecisions z couldaffectthecostfunctionandnoteverypossibleoutcomeis observable, such asin price-demanddecisionproblems, resulting in unobservable cost functions c

z; yⁱ

zⁱ

that could differ from theobservedones.The secondone dealswiththepossibilitythat theproblemisstillill-deﬁnedsincetheremaybeunobserveddata inthe counterfactual. Toovercomethesetwo issues, Bertsimas &

Kallus(2020)proposethefollowingtwoadditionalassumptions:

4. Decomposition of Decision. For some decomposition z= (^z¹,z₂)^only^z¹∈R^d^z¹ affectsuncertainty,thatis,

Y

(

^z1,z2

)

=Y

z1,z₂

∀ (

^z1,z2

)

,

z1,z₂

∈Z

5. Ignorability.For every z∈Z,Y(^z) ^is independent of Z condi- tionedonX.

Consideringthis,theprescriptiveproblemgeneralizesas:

z^∗

(

^x

)

^∈^Z^∗

(

^x

)

⁼^argmin

z∈Z E[c

(

^z;Y

(

^z

) |

^X=x

)

^]

In other words, as long as we include all aspects that affect decisionz to be taken underthe umbrellaof observable circum- stancesX,thereissufficientguaranteetoassumethattheidentifi- cabilityofcausaleffectsconditionisfulfilled(Rosenbaum&Rubin, 1983). It is relevant to note that we also adopt these same five assumptionsinthiswork.Inparticular,thefirst fourassumptions are still validunderourapproach, aswe onlyaffectthe scenario weightsw_N,i(^x; k)ând^not^the^cost^function^c(^z,yⁱ)^.^Moreover,îf^k effectivelycontainsinformationrelatedtoY(^z)^,ând^theâlgorithm introducesthisinformationbyassessingR(^x,k)

∀

^k∈K,wecanas- sumethatAssumption5holdsunderourapproach.

2.1. Illustrative example: understanding the algorithm behavior

Although we will usea full definitionof the retailerproblem to compare the Algorithm 1’s behavior under different ML approaches,we illustrateinthissection somekey aspectsoftheal- gorithm performance underasimplified problemversion. Weas- sumearetailerthatneeds todecidewhichistheoptimalamount of energy (electricity) to buy to supply the demand of its consumers.Thiscanbedonethroughawholesalespot/poolmarket,or throughforward/futures contracts. Letsassume thatspotandfor- wardpricesareexogenousandthatconsumersdemandisinelastic topricevariations,sothattheignorabilityconditionisfulfilled.In thisexample, we fix the retailprice

λ

¯^R ^to ⁸⁰€ /MWh, although itwillbeconsidered asanotherdecisionvariableintheextended modelinSection4.Then,theproblemfacedbythisretailercanbe formulatedasfollows:

maximize

Q^F,Et^Pω N

ω=1

π

ω NT

t=1

( λ

^¯^R^E^¯t^Rω−

λ

^PtωE_t^P_ω−

f∈Ft

λ

^F^Q^F^dt

)

^(9a)

s.t. 0≤ Q^F≤ ¯Q (9b)

E¯_t^R_ω=E_t^P_ω+Q^Fdt+E_t^PC,

∀

^t^,

∀ ω

^(9c)

The objectivefunction (9a)corresponds to theexpectedproﬁt obtained by the retailer, where

π

ω≡ wN,i is the probability assigned toevery scenario

ω

=1,...,N andt=1,...,T is theset ofperiodsoverwhichtheproﬁtismaximized.Thepricesarerep- resentedby

λ

^being

λ

^¯^R^,

λ

^P ^and

λ

^F ^the ^retailer electricity selling

(5)

Table 1

Spanish market daily average spot prices and demand, ﬁrst week of August 2020.

Spot Prices Real Demand

Day [ € / MWh] [MWh]

1 33.0176 28,830.9

2 28.6514 26,437.0

3 36.6316 29,291.4

4 35.5433 29,570.8

5 36.7857 30,342.1

6 37.4010 30,701.9

7 38.9991 30,657.4

price,purchasedelectricitypool/spotpriceandpurchasedelectric- ity forward price, respectively. Energy quantities are represented by E,beingE¯^R andE^P theenergysoldtotheﬁnalconsumersand purchasedinthepoolmarket,respectively.Forwardquantitiesare represented by Q^F, being dt the time range covered by the forwardcontract. Restriction(9b)determinesthemaximumquantity Q¯ allowedtobepurchasedintheforwardcontract,andrestriction (9c) representsthe energy equilibrium, wheretotal sold electricity must be equal to all the available (purchased) electricity per scenario

ω

^and^time ^t, ^being^Et^PC anyadditional electricity avail- ableforperiodt.Itshouldbenotedthatbyconstraining(9b)short salesintheforwardmarketarerestricted,since0≤ Q^F.

Forsimplicity,onlyonemonthofdatafromtheSpanishpower market is considered. Although further detail about the kNN algorithm willbe explainedin the next section, we use it hereto illustrate generalproperties ofour approach.Taking August 2020 hourly data for demand and spot prices from ESIOS (2021), we compare not only the performance of the solution to retailer’s problem (9), buthow it is affected by the selection ofhyperpa- rameterk(numberofneighbours).AsdetailedinAlgorithm1,and then applicable to any of the different ML techniques, the train dataispartitionedintoM differentregions, andwe selectthere- gionatwhichcovariateswouldsituatethepossiblespotprice,taking into account the spot price structure of that day. Here, we shouldhighlightthatthealgorithmmakesuseofamultiple-output structure, andthus, the 24 hours of the dayare mapped into a single region.Therefore, giventhe realizationof the covariates x, inthiscasethe24hourSpanishsystemoveralldemand,thekNN algorithm will identify the region R(^x^j) ^with^the ^k ^closest ^days in terms ofsimilar 24 hourdemand proﬁles. Then each scenario

ω

=1,...,k will be matched with one of these k days, and the correspondingpoolprices

λ

^Ptωwillbeﬁxedtotheobservedhourly prices(t=1,...,24) inthoseparticulardays.Then,thestochastic problem(9)willbesolvedbytakingeachscenariothatbelongsto regionR(^x^j)^with^weight

π

ω=¹_k.Ifweareinthevalidationstage, thisprocess willbe repeateduntiltheoptimum k^∗is reached,so thatMAE^k^∗≤ MAE^k

∀

^k∈K.

Compared topoint-prediction andSAAmethods, theprescrip- tive approachhas somepeculiarities thatwe willbrieﬂy describe through thefollowingexample.Consideringthe ﬁrstweek ofAu- gust 2020, with growing average demand steadily increasing, as easily observed in Table 1, we will take then the 7thdayas the basescenarioforwhichwestill don’thaveinformationaboutthe spotprices.Althoughdemand isstill unknown,thedemand fore- castgivenbyRedEléctricade España(ESIOS, 2021),thecompany in chargeofthe powersystemmaintenance andoperationinthe Spanish System, is quite accurate withlessthan 2% of errorrate overrealdemand.Therefore,wewilluserealdemandasthecur- rentforecasteddemandforday7th.

Features (covariates) that indicate theday, monthandyear of thedataarenotemployed inthisexamplebutwillbeconsidered inthecasestudysection.Thus,onlydemandsperhourareusedas covariatestoexplainspotpricesvariability.Consideringday7thas

Table 2

Spanish market daily average spot prices and power demand, from August 15, 2020 to August 25, 2020.

Spot Prices Real Demand

Day [ € / MWh] [MWh]

15 30.9996 24,535.0

16 29.1421 23,046.5

17 36.4651 26,620.1

18 38.6398 27,877.7

19 34.4472 28,577.1

20 34.6433 28,726.2

21 33.3708 29,061.3

22 32.7462 26,370.0

23 28.6548 24,369.0

24 39.6154 29,008.6

25 39.0749 30,566.7

atestscenario,anddays1stto6thastraindata,wesolveproblem (9)fordifferenthyperparametersk,withonlyoneforwardcontract withauniquepriceof

λ

^F =36.8andamaximumavailablecapac- ity ofQ¯=5000. The optimum k is reached underthis approach overk=1andk=2aslogiccouldsuggestusupfront.Sincehourly demand and prices continuously increase (Table 1 summarizes their average values), and the day 7th presents the higher ones overthesample,itisexpectedthattheclosest pointsintermsof demand are the daybefore and, if we increase k, the preceding daysthatmatchthehyperparameterk,sincetheyexhibitthelower distanceswithrespecttoday7th.However,whydovaluesofk≥ 3 andaboveincreasethe errorwithrespecttotheperfectinforma- tionratio?Themainreasonarisesfromthestochasticproblemin- trinsicbehavior. Since the scenario selectedas the3rd closest to day7thhasanaveragedailyspotpricebelowthepriceofthefor- ward,thealgorithmconsidersthatforthispointtheforwardcon- tractedshouldbecloseto0,whichisasub-optimalsolutioncom- paredto lower valuesof k<3.However, in thisscenariosetting, what seems logicalis that thepreferences abouthyperparameter selection between prediction-error andprescription-error are the same,i.e.,

{

^k^∗ML=k^∗_DD

|

^MAE_ML^k^∗^ML≤ MAEML^k ∧MAE_DD^k^∗^DD≤ MAEDD^k

∀

^k∈K

}

beingMAE^k_ML andk^∗_ML the MeanAbsoluteError andoptimalkfor theMLalgorithmfocusedonthespotpricepredictionrespectively, andMAE^k_DDandk^∗_DD theMeanAbsoluteErrorandoptimalkforthe prescriptivedata-drivenmethod,respectively.

Nevertheless,what ifwe encountera samplewhere thereexist“jumps”,orratherachangingmarketcontextforwhichpower pricespatternsdifferevenundersamedemandandweathercon- ditions,ascanusually beobserved inhistoricaldataseries.Toil- lustratethiscase,wetakespotpricesanddemandfromAugust15, 2020toAugust25,2020,andtestthealgorithmbehaviorusingthe lastdayofthesampleasthetestdata(Table2).

InthissettingwehavedayswithdemandsimilartothatofAu- gust 25,butvery differentprices,e.g., August 21,and August20, andothersthatareextremelyclosebetweenthemintermsofspot pricebutwitha completelydifferentdemandproﬁle,e.g., August 18. Ifwe solve the problem andcompare the solutions between the prescriptive algorithm and the point prediction approach in terms of hyperparameter selection, assuming three different forward prices

λ

^F ^and ^maximum ^available ^capacity, ^the ^gap ⁱⁿ ^the

processofselectingtheoptimalk^∗isnowclearer.

Inordertoanalyzetheresults,wewillmakeuseofFig.1,that plots MAE_DD (prescriptive error) on the red left axis andMAE_ML (predictive error) on the blue left axis, for a range of k values.

We also identify the optimal k^∗_DD and k^∗_ML, rendering the lowest prescriptiveandpredictiveMAE,respectively.Bydoingso,we can comparethedifferencesbetweenthesetwohyperparameterselec- tionapproaches,andhowtheyareaffectedbytheproblemstruc- ture.Thus,comparingtheresultsshowninFig.1itisevidentthat

(6)

Fig. 1. Spot Price Estimation MAE vs Data-Driven MAE, ¯Q = 50 0 0 (a) λ^F = 33 . 4 . (b) λ^F = 34 . 8 . (c) λ^F = 36 .

theoptimalk^∗variesdependingontheforwardpriceandadopted approach. Thereby, if we observe the three different graphs, the prescriptive approach learnsfaster about theseparticular market situation, mainly because it is still solved as an stochastic problem,andassuch,takesintoconsiderationworstcasemarketcon- ditions, acquiringenergy in the forward market to reduce proﬁt uncertainty. Taking the ﬁrst case in Fig.1a, where forward price isthelowestandequalto

λ

^F=33.4€ /MWh,theclosestscenario isalwaysAugust21,whoseaveragespotpriceis33.3708€ /MWh, value that is far from thetest average priceof 39.0749 € /MWh andnotagoodestimatorofthefuturesprice,asindicatedbythe bluelinegraph.Sincethecurrentforwardpriceisabove33.3708€ /MWh theobvioussolutionatthefirststage ofthe problemisto not acquire anythingfrom theforward contract to maximize the profit, which,however,impliesthelowestprofitpossiblewithre- specttotheperfectsolution,whichrendersthebiggestMAE_ML.The MAE_ML slowlydecreases,so it doesnot reach the selection ofan optimalk^∗ untilit convergestok=5,wherebythe optimaldeci- sionisjustreachedwhentherearetwoscenariosinthestochastic problem. Thestochasticsolutionisobvious,sincethedistancebe- tween

λ

^F ^and

λ

^Ptω is much lower forday 21 than between days 21 and25.Thehere-and-nowsolution willbemaximumby buy- ing forwardintheﬁrststage,since bothscenarios havethe same weightwⁱ=¹₂.Leveragingontheinformationprovidedbytheco- variates X allowsthealgorithmtoweighteach scenariobasedon theauxiliary information,improvingtheretailerdecisionwithre- specttotheoneprovidedbySAA.

InFig.1,thecasewithsamplesizeN=10couldbeconsidered as theSAA solution,since thisis thenumber ofscenarios inthe train. Thedecisiontakenbytheretaileraccordingtoit isnotad- equateasequalweightisgiventoscenariosthatdonotrepresent thecurrentscenarioconditions.Thereexistmultipletechniquesto correct and assign different probabilities (weights) to these sce-

nariosbasedonempiricaldistributionapproximations,butallstill give some weight to datapoints that are far frompotential sce- nariorealizations,contributingtoincrease thebias oftheretailer decision. We alsocan observethat the algorithm speedof learn- ingisfasterthanpointpredictionsintermsofhyperparameterse- lection since the prescriptive algorithm loss function during the validationprocessisfocused intheoptimizationproblemsolution errorand not in the target variable estimation bias,which leads alsotobetterhyperparameterselection(Fig.1a,b).Inthefollowing, we will furtherstudy ifthisbehavior remains withgreater sam- plesizesandnumberoffeatures, includinguncertainprice-quota curves,inamorerealisticandcomplexproblemsetting.

3. Prescriptiveprocedureappliedtothepowerretailerproblem 3.1. Power retailer problem description

In this section, we extend the simpliﬁed version of problem (9) to incorporate more realistic features. In particular, the new modelisbasedintheformulationpresentedinChapter8inConejo etal. (2010),where a electricity retaileraims to maximize proﬁt by participatinginthe electricity market,withno capacityto af- fectday-ahead marketprices(price-taker),butableto impactthe futurescontractsprices(price-maker).

Traditionalstochasticapproachesforthisproblemmakeuseof the CVaR(Conditional Value atRisk), as a wayto introduce risk aversion inthe decisionmaking process. In thiswork we do not considerthe CVaRdueto thefact that, inthevalidation process, different valuesof k lead to different sample shapes(and sizes), andhencedifferentempiricaldistributionsoftheuncertainparam- eters.Therefore,thetailsofthosedistributions(andtheirexpecta- tion)are notcomparable.Furthermore,consideringthattheprob- lemobjectiveistomaximizeexpectedproﬁt,certaindegreeofrisk

(7)

controlling isalready presentin ourapproach sincegivenx, data realizations (scenarios) far from E[Y

|

^X=x] have weight 0 in the solutionprocess. Aswe willobserve,thenumberofscenarioswe accountfor(ordiscard)intheoptimizationproblem,isdirectlyre- latedtothevalueoftheMLhyperparameter.

The formulationoftheretaileroptimizationproblemisasfol- lows:

maximize

Q^F_{f j},λ^Rei,vei,Et^Pω N

ω=1

π

ω NT

t=1

_N

E

e=1 NI

i=1

λ

^ReiE¯_eti^R_ω−

λ

^PtωE_t^P_ω−

f∈Ft NJ

j=1

λ

^Ff jQ^F_{f j}dt

(10a) s.t. 0≤ Q^F_{f j}≤ ¯Qf j,

∀

^f,

∀

^j ^(10b)

λ

¯^Rei−1

v

ei≤

λ

^Rei≤ ¯

λ

^Rei

v

ei,

∀

^e,

∀

ⁱ ^(10c)

NI

i=1

v

ei=1,

∀

^e ^(10d)

NE

e=1 NI

i=1

E¯_eti^R_ω

v

ei=E_t^P_ω+

f∈Ft

Q^F_fdt+E_t^PC,

∀

^t,

∀ ω

^(10e)

NJ

j=1

Q^F_{f j}=Q^F_f,

∀

^f ^(10f)

v

ei∈

{

⁰^,¹

}

^,

∀

^e,

∀

ⁱ ^(10g)

Theobjectivefunctionisfocused onexpectedproﬁtmaximiza- tion,consideringasequenceofscenarios

ω

∈^and^periods^t∈T. The retailer also accountsfor different typesof clients e∈E and price-quota blocksi∈I.In thisproblem, the uncertaintyemerges when theretailerhastodecide howmuchofevery forwardcontract Q^F_f must be signed at time t₀ and delivered at time t= 1,...N_T, ifspot prices

λ

^Ptω are not known in advance. Therefore, this is a two-stage stochastic problem (Birge & Louveaux, 2011).

The ﬁrst restriction (10b), builds each forward curve Ft available at time t, as an increasing piecewise-linear function, where the total purchased energy foreach forwardcontract is givenby restriction (10f). Eq. (10e) represents the energy balance between the available energy (right handside),and thecompromised energy to be delivered by the retailer (left hand side). Constraints (10c), (10d) and(10g), deﬁnethe price-quota curve as a decreas- ing piecewise-linear function. The price-quotacurve is our main sourceofuncertainty,andisonlydeterminedoncepoolpricesare revealed. A more detailed representation and description of the problemcouldbefoundinConejoetal.(2010),asmentionedear- lierinthissection.

In this work, the covariates used to estimate day-ahead spot pricesaredemandandthepointintime,consideringeachhourly demand(24variables),day(7dichotomicvariables),month(12di- chotomicvariables)andyear(onedichotomicvariableper yearin the train and/or thevalidation) differentiated features, forwhich those prices are estimated. Thus, the estimation itself can accu- rately differentiate between peak and base hours, together with other daily patternsthat couldindicate achange inthedemand- pricerelationship (Karakatsani& Bunn,2008). Alternativeautore- gressive models (Conejo, Contreras, Espínola, & Plazas, 2005) or factorsmodels(Liebletal.,2013)proposemanyotherpotentialco- variates tobe considered.However, wedidnot includeanyother auxiliary informationsincedemandalreadyimplicitlyincludesin- formation such asweather andsocio-economic factors ourtarget

is to analyzethe behavior of the algorithm itself, andnot to in- creasetheelectricitypricesforecastingaccuracytothepointofin- curringinundesirable overﬁtting.Additionally,muchoftheinfor- mationthatcould beusedisnormallyproprietaryofeach market participant,suchasthestatusofeachoftheelectricitygeneration units.

As previously disclosed in Algorithm 1, the problem will be solved in two steps. First, we solve the ML algorithm, indicating the hyperparameter k to be used along the ML algorithm solutionprocess,andthenweaddressthestochastictwo-stageretailer problem. The mainquestionthat arisesis howto selectthe best k.Traditionalapproacheswouldusecross-validation(Ripley,2007;

Stone, 1977), overcoming theproblem of not having a ﬁxed rule when the sample size is from medium to small size. Otherwise, theruleassignedbyCover&Hart(1967)couldbeused,aslongas n→∞.However, littleattentionis usually payedto therelation- shipbetweentheMLset-upandtheadequacyofthishyperparam- eter selection to the prescription. Furthermore,since our sample consistsof6yearsofhourlyspotpricesandpowerdemanddata, withtime-dependentcovariates, we make use ofcross-validation approaches focused on time-series hyperparameters selection, as theoneproposed inMakridakis (1990)andreferredtoas“sliding simulation”.

Themotivationtoemploy thistechniqueisthat ﬁrst,wewant to avoid high bias of the estimates and second, we want to avoid inconsistency forthe hyperparameter optimization method selected.Regarding thislast issue, traditionalcross-validationap- proaches cannot be applied directly to this method, as there is a certain time dependency and seasonal effect in demand and powerdata,asshowninFig.2.Intraditionalsettings,k-foldcross- validationselectsdata foldsaleatory,not preservingthe temporal pattern,andthusexplainingpartofpastdatabehaviorwithafu- turesampleformanyofthefoldsselected,makingthevalidation methodtheoreticallyandempiricallyinconsistentTashman(2000). Thesliding windowapproach proposed hereevaluates theac- curacyoftheprescriptionsusingonesixthofthesample,i.e.,two monthsperyearofsample, takingtheleadtime,orascommonly known forecasting horizon, of size equal to 2 periods, and thus splitting thevalidation sample infolds of size two, that increas- ingly incorporates tothe train sample. A completedescription of thisapproach can be found inTashman (2000).The error gener- atedalong each oneofthe leadtimesused duringthevalidation stepisthenincorporatedtocomputeboththeMAEoftheestima- tion and the MAE of the prescription, although only the later is usedduringthevalidationphase.

Aspreviouslyindicated,insteadofdirectlyapplyingthe“coeﬃ- cientofprescriptivenes” Pasthelossfunction,wemakeuseofthe

“Mean Absolute Error” (MAE) betweenthe perfect-foresight solution and the solution obtained through the prescriptive method, i.e., the prescription error. This allows us to compare the solutions and hyperparameter composition between the deterministic approach, what is normally called point-prediction, and the data-driven prescription, in a more directly manner. Besides, we avoid some undesirable result when comparing different sample sizeresults,sincerootmeansquarederror(RMSE)andsimilarap- proaches tend to have higher upper limits with increasing samplesizes.Wethennotonlycomparetheﬁnalresultsobtained,but also thedifferences in k betweenapplying slidingsimulation directly ontheML algorithm lossfunction andtheprescriptioner- roritself.Bydoingso,we wanttoexplore,inarealisticsetting,if thereis anysigniﬁcant difference intheproblemparameters set- up whetherwefocus ourattentioninthe prescriptionorpredic- tionerrors.

Inthe followingwe describehow thedifferentML techniques considered in this work are adapted to be used within the pre- scriptiveAlgorithm1.

(8)

Fig. 2. Spanish market hourly power spot prices and electricity demand for the time period 2014 to 2020.

3.2. kNN algorithm approach

Thek-NearestNeighborsisadistance-basednon-parametricap- proachthatreliesonﬁndingthe“closest” pointsorgroupofpoints in a certain set, withrespect to a givenpoint. Its simplicityand consistency havemadethisalgorithmone ofthemostcommonly used in different ML ﬁelds such as clustering (Henley & Hand, 1996; Sibson, 1973; Wong &Lane,1983) and supervisedlearning (Cover&Hart,1967;Weinberger&Saul,2009).

What we will useinthe currentsettingis thekNNbasic version, in which no adjustment of the importance of each of the scenarios used is applied,as longas they are partof thek closest tothetarget point.Inthesecond stage, wewillbe weighting every scenarioaccordingtoEq.(11),whered(·,·)^is^the^euclidean distance. Therefore, the function used to determine the scenario weightisthefollowing:

w^kNN_N_,i

(

^x

)

=1

kI[xⁱ∈R

(

^x

)

^:^d

(

x,xⁱ

)

≤ d

(

x,x^j

) ∀

ⁱ⁼^j ^x^j^∈^/^R

(

^x

)

^] (11)

3.3. Trees algorithm approach

Anothertype ofprominentalgorithmsinML historyarethose based intheconstruction ofmappings ofa trainingset intosub- groups of data, for classiﬁcation or regression purposes. There is a plethora of techniques based on this approach such as ID3 (Quinlan,1986), CART(Breiman,Friedman, Stone,& Olshen,1984) or C4.5 (Quinlan, 1992) among many others, each one with its advantagesanddisadvantages. Themostrecentadvancements for thesetype oftechniquesare motivatedby theimportantspeedup ofthealgorithmsforsolvingmixed-integeroptimizationproblems.

Forinstance,inBertsimas &Dunn(2017),instead ofadoptingthe traditionalheuristictop-downapproachforsplits,itisproposedan exactMIOformulationtoderivetheoptimaldecisiontreeforboth axes-alignedandmultivariatehyperplanessplits.

In this work we focus on the standard CART algorithm, due to themaintenance ofdesirable asymptoticoptimality properties, since ID3 doesnot handle numeric valuesand C4.5 could create empty ﬁnal leaves,which may lead to some inconsistency orno empiricaloptimalityguarantees.Theweightsassignedtoeachone

ofthescenariosofthetrainingsampleN followsthemapobtained byapplyingCARTtothattrainingsample,andbygivingeachpoint not belonging to x’s region a weight of w_N_,i(^x)=0, accordingto (12).

w^CART_N,i

(

^x

)

= I

(

R

(

^x

)

=R

(

^xⁱ

))

|

^j^:^R

(

^x^j

)

=R

(

^x

) |

⁽¹²⁾

where R is a partitioning of the sample N into M subsets with M

r=1R⁻¹(^r)=∅. Since CART presenta large numberof hyperpa- rameterstobeadjusted,forclaritywewillfocusonlyonthe“tree depth”.

Thismethodologycanalso beadapted totree-basedensemble methods,i.e.,RandomForest,whereweights(12)canbecomputed by combining several trees (Bertsimas & Kallus, 2020). However, wehavenot includedthesemethods inournumericalcasestudy (Section4)duetoanobservedsigniﬁcantworstperformancethan CART. This can be explained by the relatively small size of the datasetconsidered,ifcomparedwithotherlarge-scaleapplications whereensemblesandrandomizationexhibitahigheraccuracy.

3.4. K-means algorithm approach

Apartfromsupervisedapproacheswhosetheoreticalproperties andjustiﬁcationarewell established (Bertsimas& Kallus(2020)), wealsoproposeanalternativemethodbasedonunsupervisedal- gorithms.Inparticular,weconsideroneofthemoststudiedalgo- rithmsindatascience: K-means.Developedby manyauthors,the algorithmproposed byLloyd(1982)canbeconsidered oneofthe mostprominentones.Typicallyusedtoidentifypatternsorgroups segmentation, the algorithm provides divisions of a data sample into subgroups that share similar characteristics. With the unsu- pervisedalgorithm,asinprevious MLapproaches, weseektose- lectthosescenariosfromthedatasamplethatsharesomecharac- teristics,leveragingontheinformationprovidedbythecovariates X.Themain differencecomesfromthewaytheregions’segmen- tationandscenarioweightsareobtained.Sincenofeedbackispro- videdbythedependentvariable(unsupervised),thereisnorealﬁt oftheresponsevariable(spotprices).However,thedata-drivenop- timizationalgorithmeffectivelyprovideinformationtotheregions inthesamewayaspreviousMLtechniquesdo,i.e.,leveragingon theMAE.Centroidsareusedtoprovideacomparisonbetweenthe

Prescriptive selection of machine learning hyperparameters with applications in power markets: Retailer"s optimal trading

European Journal of Operational Research

Analytics, Computational Intelligence and Information Management

Prescriptive selection of machine learning hyperparameters with applications in power markets: Retailer’s optimal trading

Alberto Corredera, Carlos Ruiz

(

)





(

)

(

|

)

(

)

(

|

)

(

)

(

)

(

)

{

}

{

} |

{

} |

|

(

)

(

)

(

)

{

}

{

}

v





∀

(

(

)

)

(

(

)

)

(

)

 

(

(

))

(

(

)

)  

|

|

δ

|





|





|







(

)



)