• No se han encontrado resultados

Capítulo 3: Método

3.3. Instrumento

3.3.3. Validez del instrumento

3.3.3.1. Validez de jueces expertos

6.1 Tactical Engine: PRIZE-PARANOID

6.2 Tactical Engine: ORIGINAL-DT

6.3 Tactical Engine: PRIZE-DT

6.4 Dynamic Monitoring: Small-DMS

6.A Appendices to Chapter 6

- FLAP'S LAW

T

which prizes to target. Such planning is reasonably coarse in comparison to the fine planning actical planning is the systematic analysis of a problem in terms of the decision of determining when selecting a one-step move as for the two prize problem. Small problems can be defined as those for which this tactical planning approach is computationally feasible. In this chapter we develop tactical engines for small problems and a hierarchical dynamic monitoring system , thus designing a n u m ber of complete strategies.

6 . 0 Introduction

Section 5.6 introduced a tactical approach to the Three Prize Problem which approximates the initial tactical decision faced by a player, namely which prize to initially target. The objective of this chapter is to expand this tactical planning approach to problems involving up to approxi­ mately ten prizes and to design an implementation of the SPA/DMS with two frames. Chapter 5

146 Tactical Planning for Small Problems

has provided the step-frame and STEP-MONITOR components of the SPA/DMS. This chapter contributes the design of a number of scenario engine and a monitor for the global-frame.

6.0.1 Tactical Planning Problem

To determine tactically robust movements requires a representative coverage of possible scenarios at as fine a level of detail as computational tractability considerations will permit. Therefore, in order to analyse problems consisting of more than two prizes, we restrict consideration to tactical scenarios. The tactical planning problem is to determine which prize to target for a range of prize target scenarios of the opponent.

Solving the tactical planning problem necessarily implies contingent planning, since an im­

plicit response is required for each tactical scenario of the opponent. A systematic treatment of contingent planning must also consider recursively contingent scenarios in which contingent sequences of prize targets beyond the initial prize targets of each player are evaluated in order to determine the initial scenarios.

A natural formulation for such a planning problem is a multiple stage game with observed actions and simultaneous moves. All players know the actions chosen at all previous stages when choosing their current actions and all players move simultaneously in each stage (each player chooses his or her action at stage k without knowing the stage-k action of any other player). A

game tree is an explicit representation of all possible plays of the game (Pearl [174]). The game tree corresponds to a contingent strategy since, in selecting an initial target-prize, the player assumes that a subsequent tactical decision, involving the selection of subsequent target-prize, can be resolved by considering the particular game position encountered at that time.

From the initial game position we propose a set of scenarios in which each player selects a target. With each particular scenario we associate a representative projected game position.

To evaluate this projected game position, we recursively propose a further set of scenarios in which each player selects a target and associates with each one a representative projected game position, and so on. In this way we systematically generate a multistage game in which each stage corresponds to a tactical decision. Such a game tree is called a tactical game tree

since it approximates the decisions of a tactical planning problem at each stage. The root node corresponds to the initial game position and its children correspond to each representative game position derived from the set of tactical scenarios, and so on. Terminal nodes correspond to game positions in which at most one prize remains.

6.0.2 Tactical Engines

Recall from Section 4.2.3 that a DMS implementation requires both a scenario engine as its planner module and a monitor as its selector module. Section 6.4 will consider the design of the Small Dynamic Monitoring System, in particular the necessary global monitor and step monitor. Preparatory to this, we consider tactical game tree based scenario engines which we term tactical engines.

Representative Projected Game Positions

Let QA = {j E V : to + dAj ::::: A} and let QB = {j E V : to + dBj ::::: A} . Consider the scenario

in which player A targets prize x E QA and player B targets prize y E QB. We need to select

a projected game position to represent the implementation of this scenario. We also need to specify how committed the players are to their respective target prizes. Since a finite game tree is desirable, we reason that the players move towards their respective target-prizes at least until one player reaches its target-prize so that at least one prize is claimed at each depth of the game tree. The possibilities include the following two:

(a). As in THREE-ORIGlNAL-DT,l move a distance min{ dAzl l dBYt } until one player reaches its target-prize. We call this type of projection a probe.

(b). As in THREE-PRIZE-DT, move until one player reaches its target-prize and, if the other player does not simultaneously reach its target-prize, then that player must select the same prize at its next target-prize. We call this type of projection a commitment.

The analysis of Section 5.6 shows that steps alone are too detailed for three or more prizes, but that prizes are reasonable targets of manageable complexity. This does not dismiss methods which rely on bigger "steps" than .6.; these would need to be compared against targeting prizes in future work. Also, targeting prizes offers a computational complexity advantage in that there are only a finite number of prizes to target.

Single Stage Tactical Engine

Section 6.1 considers the design of the PRIZE-PARANOID tactical engine whose tactical game tree consists of only a single stage. Each scenario is a probe and is evaluated by determining the maximal PRIZE-GUARANTEE subpath for each player from the projected game position.

M ultiple Stage Tactical Engines

There are three requirements for specification of a multiple stage tactical engine.

Game tree generation. How do we construct the representative game positions?

Evaluating a game tree node. How do we determine an evaluation for player A of each game

tree node given the evaluations of each child node?

Searching of the game tree. How can we use the problem structure and information already gained to most efficiently determine the information required to make a decision at the root node?

The resulting root node game table summarises the tactical response information from the cor­ responding tactical tree.

Two multiple stage tactical engines are designed in this chapter. Section 6.2 considers the design of the ORIGlNAL-DT tactical engine in which each scenario at a given stage of the tactical

IThe suffix '-DT' stands for double tree since each possible path through the game-tree corresponds

148 Tactical Planning for Small Problems

game tree is a probe and the projected position of each player at each stage is at the same instant in future time. Section 6.3 considers the design of the PRIZE-DT tactical engine in which each scenario at a given stage of the tactical game tree is a commitment, and either a projected game position or a projected two-prize window feasible scenario is maintained for each player, possibly at different, though comparable, future instants of time.

6 . 1 Tactical Engine:

P RIZE-PARANOID

The guarantee subpath subproblem of Section 3.3.1 was to determine a subpath, guaranteed to a player, of maximal guaranteed value. The PRIZE-PARANOID tactical engine constructs a game ta­ ble in which each scenario is a single probe that is evaluated by solving a corresponding guarantee path subproblem.

6 . 1 . 1 Game Table Generation

A projected game position consists of a pair of projected locations, A and

Bj

a projected time-stamp, tj and a projected set of remaining prizes,

Q

� V. Let

Q

be the set of remaining prizes and let

QA

=

{j

E

Q

: t +

dAj � .\}

and

QB

=

{j

E

Q

: t +

dBj � .\}.

Choose

x

E

Q A

and

y

E

Q B

and suppose that player

A

targets prize

x

and player

B

targets prize

y.

Let t' = t +min

{dAz

,

dB

Y

}

. IT

dAz

<

dBy,

then let A' be the location of prize

x,

let

B'

be the location which is a distance

dAz

from

B

towards prize

y

and let

Q'

=

Q \ {x}.

IT

dAz

=

dBy,

then let A' b e the location of prize

x,

let

B'

be the location of prize

y

and let

Q'

=

Q \ {x, y}.

IT

dAx

>

dBy,

then let A' be the location which is a distance

dBy

from A towards prize

x,

let

B'

be the location of prize

y

and let

Q'

=

Q \ {y}.

The scenario in which

A

I> A' and

B

I>

B'

is called a

probe. The notation "probe

A-tPA

8

x

and

B-tPB

8

y"

defines a new game position in which prize

x

is appended to

P A,

prize

y

is appended to

PB,

A t- A' ,

B

t-

B',

t t- t' and

Q

t-

Q'.

We can now define how to generate the PRIZE-PARANOID game table .

'<Ix

E

QA

and

'<Iy

E

QB

probe

A-tPA

8 x and

B-tPB

8 y.

Each probe is evaluated by determining the maximum guaranteed value attainable from the corresponding projected game position. Assemble the probe evaluations in a game table with the targets,

QA,

for player

A

along the left hand side and the targets,

QB,

for player

B

along the top.

6 . 1 .2 Evaluating the Root Game Table

For each target

y

E

Q B

of player

B

the best response target of player

A

is defined by

(6.1)

where

VA (A

8

x, B

8

y)

is the maximum guaranteed value attainable from the projected game position corresponding to the probe

A-tPA

8

x

and

B-tPB

8

y.

Applying the minimax assumption (Assumption 5.6.2) gives rise to the MINIMAX evaluation of the PRIZE-PARANOID game table. Let

(6.2)

Let T be a prize-ts as in Section 3.2.3.2 and suppose that we cannot determine which target in T player B will target (this is denoted B !> T). Then the best response target of player A from the PRIZE-PARANOID game table is defined by

x· (B 0 Y) = argmaxxEQA

{

min VA (A 0 x, B 0 Y)yET

}

The corresponding MINIMAX evaluation is

(6.3)

(6.4) Consider also the natural extension of PRIZE-GUARANTEE to a prize-ts, T, in which player B is restricted to targeting a prize from T before considering any other prize. The corresponding guarantee path sub problem has earliest arrival times of player B modifed to

TBi = Tei¥ {dBi + dii}

Note that these two target-set variations provide different evaluations since they include different assumptions of what can occur past the first pair of targets. The TARGET-SET PRIZE-PARANOID

evaluation assumes that player A knows what player B initially targeted past the first target, whereas TARGET-SET PRIZE-GUARANTEE assumes that player A does not.

6 . 1 .3 Example Tactical Problem

Consider the example problem of Figure 6.1. Here, the prizes are indexed from { I , 2, 3, 4, 5, 6} with each prize location at the centre of each prize label. The players are labelled {A, B} with initial player location at the centre of each player label. The solid line partitions the prize set into those prizes initially closer to player A and those prizes initially closer to player B. The prize values, v, and locations, (x, y), initial player locations and overall deadline A are defined in

the accompanying tables. Note that EiEV Vi = 1000, i.e., the prize values sum to 1000, in this example.

6 . 1 . 3.1 Analysis of PRIZE-GUARANTEE

Table 6.1 illustrates the tactics determined by PRIZE-GUARANTEE and TARGET-SET PRIZE-GUARANTEE

for both players .

• The maximal PRIZE-GUARANTEE sub path for player A is A-+4-+6-+3 (value 471) as illus­

trated in Table 6.1(a). In particular, note that A-+1-+4 (value 355) is not guaranteed, but A-t4-t1 is guaranteed, A-+l-t3 (value 429) is guaranteed, but A-+1-+6-+3 is not guaranteed.

150 Tactical Planning for Small Problems 0.9 B 0.8 2 0.7 0.6 0.5 0.4 0.3 0.2 1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 Prize v x y 1 202 0.31 0.16 2 137 0.28 0.79 3 227 0.81 0.20 4 1 53 0.48 0.40 5 190 0.68 0.82 6 91 0.68 0.33 Total 1 000 5 3 0.6 0.7 0.8 0.9 Player x y A 0.15 0.28 B 0.36 0.84 Deadline ,\ = 00

• The maximal PRIZE-GUARANTEE subpath for player B is B-t2-t5 (value 327) as illustrated

in Table 6.1(b). Also, B-t5-t2 is guaranteed.

• With TARGET-SET PRIZE-GUARANTEE for player A, suppose that B t> {2, 4}, i.e., that

player B will claim either prize 2 or prize 4 next. The corresponding best guaranteed path is then A-tl-t6-t3 (value 520) as illustrated in Table 6.1 (c). This is because the earliest that player B could then arrive at prize 3 is via prize 2 but player A can then traverse A-tl-t6-t3 and arrive at prize 3 prior.

• With TARGET-SET PRIZE-GUARANTEE for player B, suppose that A t> { 1 , 4} , i.e., that

player A will claim either prize 1 or prize 4 next. The corresponding best guaranteed path is unchanged from B-t2-t5 (value 327) as illustrated in Table 6.1 (d).

6 . 1 .3.2 Analysis of PRIZE-PARANOID

Table 6.2 illustrates the tactics determined by PRIZE-PARANOID and TARGET-SET PRIZE- PARANOID

for both players.

• Table 6.2{a) gives the PRIZE-PARANOID game table for player A. This is interpreted by applying a MINIMAX evaluation, i.e., minimize the column maximums, which implies that player B will target B03, B04 or B06 with evaluation 471. These scenarios are highlighted in the game table with boxes, i.e., In each case the best response tactic for player A is A04. The value 471 equates to claiming prizes {3, 4, 6} . Note that a MAXIM IN evaluation of the player A game table also implies A 0 4. Table 6.2( c) illustrates the resulting PRIZE­

GUARANTEE subpaths for each player under the probe scenario of A04 and B03. Note that

prizes {1, 2} remain unclaimed by this analysis, which shows that PRIZE- PARANOID does not account for all of the prizes in determining its evaluation of a tactical probe scenario. This is simply because it relies on PRIZE-GUARANTEE for scenario evaluation .

• Table 6.2{b) gives the PRIZE-PARANOID game table for player B. A M INIMAX analysis implies that player A will target A04 and hence B02 or B05 with evaluation 327 {equating to prizes {2, 5}). Table 6.2{ d) illustrates the resulting PRIZE-GUARANTEE subpaths for each player under the probe scenario A 0 4 and B 0 5. This time, only prize 1 in not accounted for.

• With TARGET-SET PRIZE-PARANOID for player A, suppose that B t> {2, 5}. Evaluation of

the columns of the player A PRIZE-PARANOID game table corresponding to B 0 2 and B 0 5 implies that either option for player B is tactically likely with evaluation 520 {equating to prizes {I, 6, 3}). In the case of B 0 5, the best response tactic for player A is A 0 1 . Table 6.2{e) illustrates the resulting PRIZE-GUARANTEE subpaths for each player under this probe scenario. However, in the case of B 0 2, any of A 0 1 , A 0 3, A 0 4 or A 0 6 is a suitable response tactic for player A, since, in the time dB2 it takes for player B to reach prize 2, any of these probes takes player A only a short distance from its current location and only an inconsequential deviation from A-t1. Note that if we suppose that B t> {2, 4} then B 0 4 is the implied tactic of player B with evaluation 471, and hence A 0 4 as before.

1 5 2 Tactical Planning for Small Problems

Table 6. 1 : Tactical Example of PRIZE-GUARANTEE

1 0 .• 0.' 0.7 0.6 o.s 1 0.' 0.7 0.0 o.s 0.' 0.3 0.2 0.1 0 0 B

5

2

1

(a) Player A Maximal PRIZE-GUARANTEE Subpath

5

2'. ,B A, , , , , , , , , , , , , , , , , '3 , '1' ' , , 0.1 0.2 0.3 0.' o.s 0.1 0.7 0.1 (c) Player A Maximal PRIZE-GUARANTEE Subpath

given prize-ts B I> {2, 4}

1 0 .• 0.' 0.7 0.0 o.s 0.. 0.3 0.2 0.1 0 .• 0.' 0.7 0.0 0.5 0.' 0.3 0.2 0.1 0 1 0

2.:::�- - ----- ---··5

4

6

A

3

1

(b) Player B Maximal PRIZE-GUARANTEE Subpath A. 0.1 , B

- - -5

2.:- -- - - --- --

.. <4 .. 6 "1 0.2 0.3 0.' o.s 0.6 0.7 0.8 (d) Player B Maximal

3

0 .• PRIZE-GUARANTEE Subpath given prize-ts A I> { 1 , 4}

• With TARGET-SET PRIZE-PARANOID for player /3, suppose that A t>

1.

Then the corre­

sponding tactic for player /3 is any of /3 8 1, /3 8 3, /3 8 4 or /3 8 6 with evaluation 343 (equating to prizes {4, 5}). In each of these probe scenarios, the first prize claimed on the subsequent player /3 PRIZE-GUARANTEE subpath is prize 4 and hence the appropriate tactic for player /3 is /3 8 4. Table 6.2(f) illustrates the resulting PRIZE-GUARANTEE subpaths for each player under the probe scenario A 8 1 and /3 8 4. Player A's resulting PRIZE­

GUARANTEE subpath is much worse than in other scenarios since, in being constrained to

target prize

1,

player A misses out on prize 4 and then cannot subsequently guarantee both prize 6 and prize 3.

6.1.3.3 Discussion

In comparing the tactics of PRIZE-GUARANTEE and PRIZE-PARANOID, with and without a prize­ ts, we can see from this example that both players hold to a maximum guaranteed path unless a better path becomes available due to the prize-ts of the opponent. The tactics implied from each player's perspective are often different in PRIZE-PARANOID.

Suppose /3 t> {2, 4} . In TARGET-SET PRIZE-GUARANTEE this implied that player A target, A--tl but in TARGET-SET PRIZE-PARANOID this implied that A--t4. The difference is that in the former case, player /3 must commit to prize 4 in order to get to prize 3 as early as possible, but in the latter case once player A arrives at prize 4, the probe /3 8 4 is completed and player /3 may then immediately target prize 3.

This tactical example is considered further in Sections 6.2.4 and 6.3.5 where the tactics are determined by tactical engines ORIGINAL-OT and PRIZE-OT.

6 . 1 .4 S ummary

We have designed the PRIZE-PARANOID tactical engine, proposed a MINIMAX evaluator for the corresponding game table with or without a prize-ts and presented an illustrative example. The

PRIZE-PARANOID tactical engine constitutes the scenario engine component at the global-frame of the SPA/DMS. It is coupled with a PRIZE-MONITOR and STEP-MONITOR in Section 6.4 to construct a full CPCP strategy.

6 . 2 Tactical Engine:

O RI GINAL- D T

The tactical engine ORIGINAL-OT is structured around a game tree approach in which, at each game tree node, the projection of each player towards its proposed target-prize is a probe. 6 . 2 . 1 Game Tree Generation

A planning path consists of a sequence of target-prizes. A projected game position consists

Documento similar