• No se han encontrado resultados

Caracterización de la base intelectual disciplinaria en el período 1995-2008:

la perspectiva bibliométrica

4.2 Caracterización de la base intelectual disciplinaria en el período 1995-2008:

Inthis paper,we studiedthe problemoflearning the CTRsof adsinsponsored search auctionswith truthful mecha-nisms.Thisproblemishighlychallengingsinceitrequiresthecombinationofonlinelearningtools(i.e.,regretminimization

18 FromthisexperimentisnotclearwhetherR RT= ˜O(qmin1),thusimplyingthatRT doesnotdependonqminatall,orR RT issublinearinqmin,which wouldcorrespondtoadependencyRT= ˜O(qminz)with0<z<1.

Fig. 14. Dependency of the relative regret R RT on N.

algorithms)andeconomictools(i.e.,truthfulmechanisms).Whilealmostalltheliteraturefocusedonsingle-slotscenarios, herewefocusedonmulti-slotscenarios.Withmultipleslotsitisnecessarytoadoptausermodeltocharacterizehowthe CTR ofanadvariesastheallocationofdisplayedadsvaries.Here,weadoptedthecascademodel,thatisthemostcommon modelusedintheliterature.Inthepaper,westudiedanumberofscenarios,eachwithaspecificinformationsettingof un-knownparameters.Foreachscenario,wedesignedatruthfullearningmechanism,studieditseconomicproperties,derived an upperbound overtheregret,and, forsome mechanisms,alsoalower bound.We consideredboth theregretover the auctioneer’srevenueandtheSW.

We showedthat forthe cascade modelwith only position-dependent externalities it is possible to design a truthful no-regret learning mechanismforthegeneralcasein whichall theparameters are unknown.Ourmechanismpresents a regret O˜(T23) andit is DSIC in expectation w.r.t. therealization ofthe random component ofthe mechanism. However, it remains open whetheror not it is possible to obtain a regret O˜(T12). For specific cases, in which some parameters are known to the auctioneer, we obtained better results in terms of either incentive compatibility, obtaining dominant strategy truthfulness,orregret,obtaining a regretofzero.Weshowedthat forthecascademodelwiththeposition- and ad-dependentexternalitiesitispossibletodesignaDSIC aposteriori mechanismwitharegretO˜(T23)whenonlythequality isunknown.Instead,evenwhenthecascademodelisonlywithad-dependentexternalitiesandnoparameterisknown,it isnot possibletoobtainano-regretDSIC aposteriori mechanism.Theproof ofthisresultwouldseemtosuggestthat the sameresultholdsalsowhena randommechanismisadoptedandthetruthfulness isinexpectationw.r.t.its realizations.

However, we didnotproduceanyproof forthat, leavingit forfutureworks.Finally,we empiricallyevaluatedthebounds weprovided,showingthatthedependencyoftheregretontheparametersismostlycorrectinaworst-casescenario.

Twomain questionsdeservefuture investigation.The firstquestion concerns thestudyofa lower bound forthecase in which thereare only position-dependentexternalities andtruthfulness isin expectationinexpectationw.r.t. only the realizationsoftherandomcomponentofthemechanismoralsow.r.t.theclickrealizations.Furthermore,itisopenwhether theseparationofexplorationandexploitationphasesisnecessaryand,inthenegativecase,whetheritispossibletoobtain aregret O˜(T12).Thesecondquestionconcernsasimilarstudyrelatedtothecasewithonlyad-dependentexternalities.

Appendix A. Vickrey–Clarke–Grovesmechanism

Consideragenericdirect-revelationmechanismM= (N ,V,,f,{pi}i∈N)asdefinedinSection3.2.Differentlyfromthe SSA case,ingeneralthetypeofanagent,denotedbyviforconsistencywiththerestofthepaper,isavectorofparameters.

Wedefineafunctionvali: ×V → R+,whichreturnsthevalueobtainedbyagentaiwhenitstypeisviandtheallocation chosenbythemechanismisθ.

TheVCG mechanismisobtainedcouplingthetwofollowingfunctions:

theallocationfunction f whichreturnstheallocationmaximizing thesocialwelfare,i.e., f

(

v

ˆ ) =

arg max

θ∈SW

(θ,

v

ˆ ) =

arg max

θ∈

i∈N

vali

(θ,

v

ˆ

i

) ;

thepaymentrule pi,whichdefinesthepaymentrequiredfromagentai,i.e., pi

v

) =

SW

(

f

vi

),

v

ˆ

i

)

SWi

(

f

(

v

ˆ ),

v

ˆ )

=

j∈N ,j=i

valj

(

f

vi

),

v

ˆ

j

)

j∈N ,j=i

valj

(

f

v

),

v

ˆ

j

),

wherewedenoteby f(ˆvi)theallocationreturnedby f whenagenti doesnotparticipatetotheauction.

Inthisquasi-linearenvironment,whentherearenointerdependenciesamongthetypesoftheagentsandthe no-single-agenteffect[3]holds,theVCG mechanismisAE, DSIC aposteriori,IR aposteriori,andWBB aposteriori.

Appendix B. MonotonicityandMyerson’spayments

Consideragenericdirect-revelationmechanismM= (N ,V,,f,{pi}i∈N)asdefinedinSection3.2.Asingle-parameter linearenvironmentissuchthat

thetypeofeachagentaiisascalar vi(single-parameterassumption),

theutility function ofagent ai isuiv)=zi

Anallocationfunction f ismonotonic inasingle-parameterlinearenvironmentifforanyvˆ−i

zi possibletodesignaDSIC mechanismimposingthefollowingpayments[35]:

pi

(

v

ˆ ) =

hi

vi

) +

zi Appendix C. ProofofrevenueregretinTheorem 2

WestartbyreportingtheproofofProposition 1.

ProofofProposition 1. Thederivation is a simpleapplicationof the Hoeffding’sbound.We first notice that each ofthe termsintheempiricalaverageq˜i(Eq.(11))isboundedin[0;1/ π(it)].Thusweobtain

Byreorderingthetermsinthepreviousexpressionwehave

η =

whichguaranteesthatalltheempiricalestimatesq˜iarewithin

η

ofqi foralltheadswithprobability,atleast, 1− δ. 2 Beforestatingthemainresultofthissection,weneedthefollowinglemma.

Lemma1.ForanyslotsmwithmK,withprobability1− δ,

Proof. TheproofisastraightforwardapplicationofProposition 1.Weconsidertheoptimalallocationθ definedinEq.(2) andtheoptimalallocation ˜θwhenestimatesq˜+areadopteddefinedinEq.(16).Wedenoteh=

α

(m;θ)arg max

i∈N(qivˆi;m), i.e.,theindexoftheadallocatedinagenericslotinpositionm.Therearetwopossiblescenarios:

If

π

(h;˜θ)<m (thead is displayed into ahigher slotin the approximatedallocation ˜θ), then ∃jN s.t.

π

(j;θ)<

m

π

(j;˜θ)m.Thus

maxi∈N

q+i v

ˆ

i

;

m

) ≥ ˜

q+jv

ˆ

j

qjv

ˆ

j

qhv

ˆ

h

=

max

i∈N

(

qiv

ˆ

i

;

m

)

wherethesecondinequalityholdswithprobability1− δ;

If

π

(h;˜θ)m (theadisdisplayedintoalowerorequalslotintheapproximatedallocation ˜θ),then maxi∈N

q+i v

ˆ

i

;

m

) ≥ ˜

q+hv

ˆ

h

qhvh

=

max

i∈N

(

qiv

ˆ

i

;

m

)

wherethesecondinequalityholdswithprobability 1− δ. Inbothcases,thestatementfollows. 2

ProofofTheorem 2.

Step1:expectedpayments. The proof follows steps similar to those in theproofs in [20]. We first recall that since the mechanismisDSIC in expectationw.r.t.the clicks,then wecan directlyfocusontheregretwhen theactualvalues v are bid.Foranyadaisuchthat

π

(i;θ)K ,theexpectedpaymentsoftheVCG mechanisminthiscasereducetoEq.(9):

while,giventhedefinitionofA-VCG1 reportedinSection4.1,theexpectedpaymentsforatt-thiterationoftheauctionare

˜

Step2:per-stepexplorationregret. Sinceforany1≤t

τ

,A-VCG1setsallthepaymentsto0,theper-stepregretis rt

=

Step3:per-stepexploitationregret. Now we focus on the expected (w.r.t. click realizations) per-step regret during the exploitation phase.According to thedefinition ofpayments, ateach stept∈ {

τ

+1,. . . ,T} ofthe exploitation phase we boundtheper-stepregretr as

rt

=

Bydefinitionofthemaxoperator,sincel+1>m,itfollowsthat max

withprobabilityatleast1− δ.Noticethat,bydefinitionofl,K

l=ml= mK+1= m.Furthermore,fromthe defini-tionofq˜+i andusingEq.(14)wehavethatforanyadai:

˜

q+i

qi

= ˜

qi

qi

+ η

2

η ,

withprobabilityatleast1− δ.Thus,thedifferencebetweenthepaymentsbecomes rt

2vmax

Step4:cumulativeregret. Wefirstconsiderthe(low-probability)eventinwhichtheboundonq˜+i derivedinProposition 1.

Inthiscase,we cannotguaranteeanythingaboutthebehaviorofthemechanism, sincethepaymentsare veryinaccurate estimatesoftheCTRs, andthusthelargestpossibleregretissuffered.Inparticular,weconsidertheworstcaselossofvmax foreachslotforeach step,leadingto atotalregretof vmaxK

m=1 m



T withprobability δ.Bysummingup theregrets reportedinEq.(C.3)duringtheexplorationphaseandEq.(C.6)duringtheexploitationphaseandbyconsideringthatthese boundsholdwithprobabilityatleast1− δ(upper-boundedby1inthefollowing),weobtainanexpectedregret

RT

vmax

where Rei istheupperboundonthe regretsufferedduring theexploitationphase (whichholdswithprobability atleast 1− δ), Rer istheupperboundontheregretsufferedduring theexploitationphase(which holdswithprobabilityatleast 1− δ)andRδ istheupperboundontheregretwhentheboundsdonothold(withprobabilityatmostδ).Thisboundcan befurthersimplified,giventhatK

m=1 mK ,as

Step5:parametersoptimization. BesidedescribingtheperformanceofA-VCG1,thepreviousboundalsoprovidesguidance fortheoptimizationoftheparameters

τ

andδ.WefirstsimplifytheboundinEq.(C.7)as

RT

vmaxK deriva-tiveofthepreviousboundw.r.t.

τ

,setittozeroandobtain

vmaxK

Substitutingthisvalueof

τ

intoEq.(C.8)leadstotheoptimizedbound

RT

vmaxK

19 Noticethatinthelogarithmictermthefactorof2wehaveinProposition 1disappearssinceinthisproofweonlyneedtheone-sidedversionofthe bound.

Wearenowleftwiththechoiceoftheconfidenceparameterδ∈ (0,1),whichcanbeeasilysettooptimizetheasymptotic rate(i.e.,ignoringconstantsandlogarithmicfactors)as

δ =

K13T13N13

Wethusobtainthefinalbound

RT

4vmax

2 3

minK23T23N13



log K13T13N23



1

3

.

We havetoimposetheconstraintsthat T> NK (givenby δ <1)andthat T>

τ

,i.e., T> N

K 2minlogNδ.The twoconstraints imply:

T

>

N