la perspectiva bibliométrica
4.2 Caracterización de la base intelectual disciplinaria en el período 1995-2008:
Inthis paper,we studiedthe problemoflearning the CTRsof adsinsponsored search auctionswith truthful mecha-nisms.Thisproblemishighlychallengingsinceitrequiresthecombinationofonlinelearningtools(i.e.,regretminimization
18 FromthisexperimentisnotclearwhetherR RT= ˜O(q−min1),thusimplyingthatRT doesnotdependonqminatall,orR RT issublinearinqmin,which wouldcorrespondtoadependencyRT= ˜O(q−minz)with0<z<1.
Fig. 14. Dependency of the relative regret R RT on N.
algorithms)andeconomictools(i.e.,truthfulmechanisms).Whilealmostalltheliteraturefocusedonsingle-slotscenarios, herewefocusedonmulti-slotscenarios.Withmultipleslotsitisnecessarytoadoptausermodeltocharacterizehowthe CTR ofanadvariesastheallocationofdisplayedadsvaries.Here,weadoptedthecascademodel,thatisthemostcommon modelusedintheliterature.Inthepaper,westudiedanumberofscenarios,eachwithaspecificinformationsettingof un-knownparameters.Foreachscenario,wedesignedatruthfullearningmechanism,studieditseconomicproperties,derived an upperbound overtheregret,and, forsome mechanisms,alsoalower bound.We consideredboth theregretover the auctioneer’srevenueandtheSW.
We showedthat forthe cascade modelwith only position-dependent externalities it is possible to design a truthful no-regret learning mechanismforthegeneralcasein whichall theparameters are unknown.Ourmechanismpresents a regret O˜(T23) andit is DSIC in expectation w.r.t. therealization ofthe random component ofthe mechanism. However, it remains open whetheror not it is possible to obtain a regret O˜(T12). For specific cases, in which some parameters are known to the auctioneer, we obtained better results in terms of either incentive compatibility, obtaining dominant strategy truthfulness,orregret,obtaining a regretofzero.Weshowedthat forthecascademodelwiththeposition- and ad-dependentexternalitiesitispossibletodesignaDSIC aposteriori mechanismwitharegretO˜(T23)whenonlythequality isunknown.Instead,evenwhenthecascademodelisonlywithad-dependentexternalitiesandnoparameterisknown,it isnot possibletoobtainano-regretDSIC aposteriori mechanism.Theproof ofthisresultwouldseemtosuggestthat the sameresultholdsalsowhena randommechanismisadoptedandthetruthfulness isinexpectationw.r.t.its realizations.
However, we didnotproduceanyproof forthat, leavingit forfutureworks.Finally,we empiricallyevaluatedthebounds weprovided,showingthatthedependencyoftheregretontheparametersismostlycorrectinaworst-casescenario.
Twomain questionsdeservefuture investigation.The firstquestion concerns thestudyofa lower bound forthecase in which thereare only position-dependentexternalities andtruthfulness isin expectationinexpectationw.r.t. only the realizationsoftherandomcomponentofthemechanismoralsow.r.t.theclickrealizations.Furthermore,itisopenwhether theseparationofexplorationandexploitationphasesisnecessaryand,inthenegativecase,whetheritispossibletoobtain aregret O˜(T12).Thesecondquestionconcernsasimilarstudyrelatedtothecasewithonlyad-dependentexternalities.
Appendix A. Vickrey–Clarke–Grovesmechanism
Consideragenericdirect-revelationmechanismM= (N ,V,,f,{pi}i∈N)asdefinedinSection3.2.Differentlyfromthe SSA case,ingeneralthetypeofanagent,denotedbyviforconsistencywiththerestofthepaper,isavectorofparameters.
Wedefineafunctionvali: ×V → R+,whichreturnsthevalueobtainedbyagentaiwhenitstypeisviandtheallocation chosenbythemechanismisθ.
TheVCG mechanismisobtainedcouplingthetwofollowingfunctions:
•theallocationfunction f whichreturnstheallocationmaximizing thesocialwelfare,i.e., f
(
vˆ ) =
arg maxθ∈SW
(θ,
vˆ ) =
arg maxθ∈
i∈N
vali
(θ,
vˆ
i) ;
•thepaymentrule pi,whichdefinesthepaymentrequiredfromagentai,i.e., pi
(ˆ
v) =
SW(
f(ˆ
v−i),
vˆ
−i) −
SW−i(
f(
vˆ ),
vˆ )
=
j∈N ,j=i
valj
(
f(ˆ
v−i),
vˆ
j) −
j∈N ,j=i
valj
(
f(ˆ
v),
vˆ
j),
wherewedenoteby f(ˆv−i)theallocationreturnedby f whenagenti doesnotparticipatetotheauction.
Inthisquasi-linearenvironment,whentherearenointerdependenciesamongthetypesoftheagentsandthe no-single-agenteffect[3]holds,theVCG mechanismisAE, DSIC aposteriori,IR aposteriori,andWBB aposteriori.
Appendix B. MonotonicityandMyerson’spayments
Consideragenericdirect-revelationmechanismM= (N ,V,,f,{pi}i∈N)asdefinedinSection3.2.Asingle-parameter linearenvironmentissuchthat
• thetypeofeachagentaiisascalar vi(single-parameterassumption),
• theutility function ofagent ai isui(ˆv)=zi
Anallocationfunction f ismonotonic inasingle-parameterlinearenvironmentifforanyvˆ−i
zi possibletodesignaDSIC mechanismimposingthefollowingpayments[35]:
pi
(
vˆ ) =
hi(ˆ
v−i) +
zi Appendix C. ProofofrevenueregretinTheorem 2WestartbyreportingtheproofofProposition 1.
ProofofProposition 1. Thederivation is a simpleapplicationof the Hoeffding’sbound.We first notice that each ofthe termsintheempiricalaverageq˜i(Eq.(11))isboundedin[0;1/π(i;θt)].Thusweobtain
Byreorderingthetermsinthepreviousexpressionwehave
η =
whichguaranteesthatalltheempiricalestimatesq˜iarewithin
η
ofqi foralltheadswithprobability,atleast, 1− δ. 2 Beforestatingthemainresultofthissection,weneedthefollowinglemma.Lemma1.Foranyslotsmwithm∈K,withprobability1− δ,
Proof. TheproofisastraightforwardapplicationofProposition 1.Weconsidertheoptimalallocationθ∗ definedinEq.(2) andtheoptimalallocation ˜θwhenestimatesq˜+areadopteddefinedinEq.(16).Wedenoteh=
α
(m;θ∗)∈arg maxi∈N(qivˆi;m), i.e.,theindexoftheadallocatedinagenericslotinpositionm.Therearetwopossiblescenarios:
• If
π
(h;˜θ)<m (thead is displayed into ahigher slotin the approximatedallocation ˜θ), then ∃j∈N s.t.π
(j;θ∗)<m∧
π
(j;˜θ)≥m.Thusmaxi∈N
(˜
q+i vˆ
i;
m) ≥ ˜
q+jvˆ
j≥
qjvˆ
j≥
qhvˆ
h=
maxi∈N
(
qivˆ
i;
m)
wherethesecondinequalityholdswithprobability1− δ;
•If
π
(h;˜θ)≥m (theadisdisplayedintoalowerorequalslotintheapproximatedallocation ˜θ),then maxi∈N(˜
q+i vˆ
i;
m) ≥ ˜
q+hvˆ
h≥
qhvh=
maxi∈N
(
qivˆ
i;
m)
wherethesecondinequalityholdswithprobability 1− δ. Inbothcases,thestatementfollows. 2
ProofofTheorem 2.
Step1:expectedpayments. The proof follows steps similar to those in theproofs in [20]. We first recall that since the mechanismisDSIC in expectationw.r.t.the clicks,then wecan directlyfocusontheregretwhen theactualvalues v are bid.Foranyadaisuchthat
π
(i;θ∗)≤K ,theexpectedpaymentsoftheVCG mechanisminthiscasereducetoEq.(9):while,giventhedefinitionofA-VCG1 reportedinSection4.1,theexpectedpaymentsforatt-thiterationoftheauctionare
˜
Step2:per-stepexplorationregret. Sinceforany1≤t≤
τ
,A-VCG1setsallthepaymentsto0,theper-stepregretis rt=
Step3:per-stepexploitationregret. Now we focus on the expected (w.r.t. click realizations) per-step regret during the exploitation phase.According to thedefinition ofpayments, ateach stept∈ {
τ
+1,. . . ,T} ofthe exploitation phase we boundtheper-stepregretr asrt
=
Bydefinitionofthemaxoperator,sincel+1>m,itfollowsthat max
withprobabilityatleast1− δ.Noticethat,bydefinitionofl,K
l=ml= m− K+1= m.Furthermore,fromthe defini-tionofq˜+i andusingEq.(14)wehavethatforanyadai:
˜
q+i
−
qi= ˜
qi−
qi+ η ≤
2η ,
withprobabilityatleast1− δ.Thus,thedifferencebetweenthepaymentsbecomes rt
≤
2vmaxStep4:cumulativeregret. Wefirstconsiderthe(low-probability)eventinwhichtheboundonq˜+i derivedinProposition 1.
Inthiscase,we cannotguaranteeanythingaboutthebehaviorofthemechanism, sincethepaymentsare veryinaccurate estimatesoftheCTRs, andthusthelargestpossibleregretissuffered.Inparticular,weconsidertheworstcaselossofvmax foreachslotforeach step,leadingto atotalregretof vmaxK
m=1m
T withprobability δ.Bysummingup theregrets reportedinEq.(C.3)duringtheexplorationphaseandEq.(C.6)duringtheexploitationphaseandbyconsideringthatthese boundsholdwithprobabilityatleast1− δ(upper-boundedby1inthefollowing),weobtainanexpectedregret
RT
≤
vmaxwhere Rei istheupperboundonthe regretsufferedduring theexploitationphase (whichholdswithprobability atleast 1− δ), Rer istheupperboundontheregretsufferedduring theexploitationphase(which holdswithprobabilityatleast 1− δ)andRδ istheupperboundontheregretwhentheboundsdonothold(withprobabilityatmostδ).Thisboundcan befurthersimplified,giventhatK
m=1m≤K ,as
Step5:parametersoptimization. BesidedescribingtheperformanceofA-VCG1,thepreviousboundalsoprovidesguidance fortheoptimizationoftheparameters
τ
andδ.WefirstsimplifytheboundinEq.(C.7)asRT
≤
vmaxK deriva-tiveofthepreviousboundw.r.t.τ
,setittozeroandobtainvmaxK
Substitutingthisvalueof
τ
intoEq.(C.8)leadstotheoptimizedboundRT
≤
vmaxK19 Noticethatinthelogarithmictermthefactorof2wehaveinProposition 1disappearssinceinthisproofweonlyneedtheone-sidedversionofthe bound.
Wearenowleftwiththechoiceoftheconfidenceparameterδ∈ (0,1),whichcanbeeasilysettooptimizetheasymptotic rate(i.e.,ignoringconstantsandlogarithmicfactors)as
δ =
K−13T−13N13Wethusobtainthefinalbound
RT
≤
4vmax−
2 3
minK23T23N13
log K13T13N23
13
.
We havetoimposetheconstraintsthat T> NK (givenby δ <1)andthat T>
τ
,i.e., T> NK2minlogNδ.The twoconstraints imply:
T