Vc being a fun ctio n fro rn F n (n being the dimension o f U) to F . Suppose that the algorithm being tested provides information about the process from which a s u ita b le function Vc can be constructed . This has the m erit o f being p o te n tia lly much more powerful than the previous approach, p a r tic u la r ly i f the algorithm being tested is not severely suboptimal. Furthermore, i f a su ita b le function can be derived from a sin g le sim ulation t r i a l , then the controlled output o f each t r i a l is independent, and so the problems in estim ating the variance of the estimate of the mean o f the a ttrib u te of in te re s t do not a r is e .
-173-
Derivation o f a su ita b le functional form fo r V^, depends on the model o f the sto c h a s tic process w ith in which the algorithms being tested operate, and on the choice of control v a r ia t e s , U. This is explored fo r the p a r tic u la r ap p lica tio n being considered, namely the sim ulation of a f i n i t e time period Markov p ro cess, a fte r a more d etailed d e scrip tio n of it s stru ctu re in Section 3 .2 . Natural re s t r ic t io n s to the c la s s o f adm issible fun ctio nal forms of Vc are introduced and discussed in Section 3 .3 . I t is shown in Section 3.4 that with th is r e s t r ic t io n , the problem of constructing reduces to one of modelling the expected future co n trib u tio n to the a ttrib u te of in te re s t as a functio n o f the next time-period state space re a lis a tio n
3 .2 . A Symbolic Representation o f the Underlying Pro cess.
H ereafter the concern o f th is chapter i s w ith the evaluatio n of approximate smoothing algorithms which operate w ith in a d iscre te time, f in it e time-period Markov model o f the environment, as discussed in Section 1. The notation introduced here should be s u f f ic ie n t ly general to cover a l l such a p p lic a tio n s.
The process n a tu ra lly divid es into T d is t in c t time periods: 1, 2 , . . . , T . The su b scrip t t w ill be used to denote that some quantity p ertains to time period t . Then pertaining to each time period is the p rin cip a l a ttrib u te of in t e r e s t , Vt> h e re a fte r c a lle d revenue or p r o fit (considered to be a s c a la r ) , the state v a ria b le a t the end o f the time period, Qt (t h is i s considered to be a vector and might re p re se n t,fo r example, stock le v e ls and a weighted average of previous s a le s ) , the
-174-
contro ls ap p lied , Xt (a ls o considered to be a vector and 1t 1s an output from the algorithm being stu d ie d , perhaps production le v e l s ) , and the sto ch astic inp u t, Y^. Again th is is considered to be a vector and might represent demand fo r products or a v a i la b il it y of raw m a te ria ls .
Now suppose th at a p a r tic u la r algorithm , assuming a p a rtic u la r model of the underlying p ro cess, is being studied. Then the sequence of sta te v a ria b le s , {Qq, Q ^ forms a Markov process. The revenue Vt is a random va ria b le and the to ta l revenue from a sim ulation t r ia l is
V = E v . t= l
z
Furthermore, i t can be assumed that Vt is a function of Qt 1 and For in the la s t re s o rt the sta te space could be expanded and Vt se t to zero fo r a l l t < T and Vy set to V.
As fa r as p o ssib le random v a ria b le s w ill be denoted by upper case le t t e r s and t h e ir outcomes by lower case le t t e r s .
Consider the progress o f a sim ulation t r i a l . At the s t a r t of time period t ( Y ^ , . . . , Y t _^) have been re a lis e d by ( y ^ , . . . , y t _ j ) and (Q ^ ,...,Q t j ) by ( q ^ , . . . , q t ^ ). The algorithm is run fo r time periods t , t + l , . . . , T . The t th time-period co n tro ls are then determined and the d is trib u tio n of Qt given that Qt l = qt _1 is completely known.
Define the "one-step-ahead" expected revenue:
VHt ' VHt(Qt- l> - " here W qt - p = EQt ^Vt^qt - l ,t5t^
-175-
where Fn i n ( ' ) is the d is trib u tio n function of Q. given that
gt ‘ qt - l
1
So is the expected revenue in time period t given that Qt ^ = qt _^- Because o f the remark in the above paragraph VHt can be calculated e x a c tly from the output o f the scheduling algorithm .
T
Set VH = ^ VH t- Then EV^ = EV.
EV might be estimated by averaging the re a lise d values o f V or VH> When the m artingle control s t a t is t ic derived in Sections 3 .3 and 3.4 is used, however, i t w ill make no d iffe re n c e , as w ill be shown.
3 .3 . The Martingale Control V ariates
A p a rtic u la r form of control v a ria te function is now proposed. I t w ill be shown that fo r each t r i a l , the sequence o f functions of control v a ria te s proposed forms a m artingale. Although t h is puts a r e s t r ic t io n on the c la ss of functions of the control v a ria te s which w il l be considered, there is no consequent sub o p tim ality. Subsequent sectio n s deal with o p tim ality c r it e r i a fo r the martingale and suggest ways in which i t might be constructed from inform ation provided by the scheduling algorithm .
Advantage is taken of the tim e-periodic stru ctu re of the process being sim ulated. Decompose the total control fo r each t r i a l VC(U) into p arts associated with each time period, i . e .
le t Vr (U) = E Vr t (U)
-176-
where V ^ U ) is the control p e rtain in g to the t th time period. I t i s now proposed that be a fun ctio n o f the state v a ria b le at the s t a r t of the t th time perio d, Qt and the d iffe re n ce
between the state va ria b le a t the end o f the t th time period and i t s expected value one time period before, Qt - E(Qt |Qt ^).
1 ,e - VCt = VC t(Qt - l ,Qt " ^ t ^ t - l ^ *
Since the total revenue can be decomposed into the sum of revenues accrued in each time period, which a re , moreover, functions of the sta te va ria b le a t the s t a r t and end of th at time period, th is is a natural r e s t r ic t io n . That, p o t e n t ia lly , a l l the v a r a b ilit y in the estim ate o f EV can be elim inated is shown a t the end o f Section 3 .4 . Also the functions V^t are r e s t r ic te d to be those fo r which
E (Vc t lQ t.i> ■ o.
Again, nothing is lo s t by t h is r e s t r ic t io n sin ce EVc t must be known and may, without lo ss of g e n e ra lity , be s e t to the zero. Therefore E(Vc t lQt - i ) must be taken to be ze ro , sin ce i t cannot be assumed that the d is trib u tio n o f Qt ^ is known.
The t th time period control i s regarded as a control o f the t th time period revenue. The to tal co n tro lle d a ttrib u te of in t e r e s t , V*, can be expressed a s :
T
The ra tio n a le behind the use of the s ta te random state v a ria b le s rath e r than, sa y, the random in p u t, is that in each t r ia l the re a lise d value of the a ttrib u te o f in t e re s t (revenue) i s , by assumption, a computable functio n o f the sequence of state space re a lis a tio n s and is only a function o f the random input through it s e ffe c t on the state space r e a lis a t io n s . Moreover, the t th^ time period revenue, Vt , is a d ir e c t ly computable function of Qt 1 and Q .
In t u it iv e ly the construction o f may be regarded as fo llo w s. For a p a r tic u la r t r i a l , suppose that the s t a r t of time period t has been reached, (Q j...are rea1ised by qt _ j contains a l l the inform ation necessary to describe the state of the process a t the s t a r t o f time period t . Given th is inform atio n,
VCt(qt - r Qt"E(Qt K - l ^ is a measure the " 1Uck" associated with the
next p o sitio n on the sta te space,
Now, a m artingale is defined to be a sequence of random v a ria b le s , M say w ith the property that
E(Mt |M0,M1...Mt- 1 ) = Mt _ 1( fo r a l l t * 1.
The term M^-M^ ^ is c a lle d the t th m artingale d iffe re n ce . The martingale property i s therefore equivalent to the property that the expected value o f each m artingale d iffe re n ce given a ll the previous martingale d iffe re n ce s is zero.
i . e . I Mqi »Md2 * * * * *MD t-1^ = 0 fr° r a11 t 2 1 where MQt ■ Mt -
-178-
Theorem 3 .1 .
(a ) EVc t = 0 fo r a l l t and
(b) The Vc t *s are m artingale d iffe re n ce s
i . e . (Vc l , Va + VC2, VC1 + Vc2 + Vc3 , . . . ) is a m artingale.
Proof
(a ) Let Fn (• )» Fn i ( • ) be the d is trib u tio n functions o f Q. .
4t - l ‘t ' qt - l t_1
and Qt given that Qt _1 = dt _ 1. re s p e c tiv e ly . Now V~t is a function of Qt _ i and Qt .
Therefore EVc t = J E(Vc t |QM * O d F ^ U )
■ 0,
because E( vc tlQ t - l = ? ) = 0 fo r a ll
(b) To prove the second part o f the theorem i t i s s u f fic ie n t to show that
E(VC t lVC l,V C2...VCt-l^ " °*
i - e - E( VCt^VCu = VCu’ U = ■ 0
Now le t Fyt _1 (s Vq j. vc2” , , , v ct-l^ be the d i s t r ’ bution function of Qt ^ given that = v ^ fo r u = l , 2 , . . . , t - l . Also le t S be the se t o f possible re a lis a t io n s of Qt l given VCu = v Cu, u = l ,2, . . . , t - l .
N0W E [* c t IQt . , ■ ? ; »Cu - vCu;u = 1...t -1] = E[Vc t |Qt - i = 5 ] . fo r a l l C e S »
-179-
E[VCt>VCu = vCu’ U = 1...^
" 1 E C V c t l Q t - r ^ F q ^ U )
?£S » 0,
because the integrand is zero fo r a l l £ e S .
Therefore the t ' s are martingale d iffe re n ce s and w ill h e re a fte r be re fe rre d to as such.
3 .4 . The Optim ality C rite rio n fo r the M artingale D ifference Functions I t is desired to choose the m artingale d iffe re n ce s to minimise the variance o f the co n tro lle d revenue
T
V* =
l
(vt
- V )t-1 *
T T
Now var V* =
l l
Cov (V . - Vr t ,V - V- )t= l u=l x L u
. J [ E « V t - V c t ) ( v0- V [ u ) } - ( E V ) 2
sin c e EVc t = EVCu = 0.
Because the summand is symmetric in t and u , var V* can be expressed as
E <£t<VCt-2V t > * 2 \ J t lVCtVCu - V „ • VtVCu» + " r V-
where V =
l
V .. t-180-
Now v a r V 1s independent of the Vc t ' s , so i t is desired to minimise the sum of
V E { S‘ vc t - 2vc t V > and
V E ( i i t (v c t #cu - W - V c u » -
Conditioning with respect to Qt can be expressed as
S i - M i E(Vct2-2VctVt |QM »
■ E i I v . r ( * ct|0t. 1) - 8 e o v (* ct.*t |Qt. I )).
Conditioning with respect to S2 can be expressed as
Sz = 2 E ( t J t E (v c t V vc t V V c A - i > >
■ E<2
\
E j t E<vCu - » A - l >' vt J t E<vC u lV l) lQt-l1>-
conditioning with respect to Qt _ j a ls o . Now the expectation of given Qu ^ is zero , therefore