• No se han encontrado resultados

Modalidades de Emisión electrónica de comprobantes de pago

CAPÍTULO 4. RESULTADOS

4.2. Modificaciones de Resoluciones de Superintendencia

4.2.3. Modalidades de Emisión electrónica de comprobantes de pago

To illustrate howDTGOLOGcalculates an optimal policy we give an example in Chapter 4.2. Boutilier et al. (2000) give an example from a service robotics domain. The robot’s objective is to deliver mail in an office environment. The domain axiomatization comprises fluents like hasM ail(person, s), mailP resent(person, m, s), robotLoc(loc, s).

TheDTGOLOGprogram which calculates the optimal policy for delivering the mails is proc Mail

while∃p.¬attempted(p) ∧ ∃n.mailP erson(p, n)) do ̟(p, people,

(¬attempted(p) ∧ ∃n.mailP resent(p, n); deliverT o(p))) endwhile

endproc

deliverT o is a complex procedure including picking up letters from the post box, moving to a person’s office, handing over the letter, and returning to the post box. Each person in the domain has a different priority for mail being delivered to them. The reward the robot receives for his actions is discounted depending on the priority of the person and on the time it approximately takes to deliver the mail. The condition¬attempted(p) serves as a guard condition here.

The state space of this domain havingP people, L locations and N letters is about 2N· (6 ·

N + 6)P· L3. The authors state that a problem instance with 5 people and 7 locations would

demand 2.7 billion states for ordinary value iteration. InDTGOLOGthis problem size can still be handled.

In his Ph.D. thesis Soutchanski (2003) comparesDTGOLOGwith the state-of-the-art MDP solver SPUDD (Hoey et al. 1999). He shows thatDTGOLOGsolves problem instances faster than SPUDD. The reason lies in the forward search value iteration algorithm which is used by DTGOLOG(by means of BestDo). Only the relevant successor states are expanded instead of SPUDD which has to iterate over all states of the MDP. The optimal policies on a given problem instance are the same w.r.t. the states expanded byDTGOLOG(clearly,DTGOLOGcannot provide an optimal action for states which where not visited; SPUDD on the other hand provides optimal actions for all states) and the resulting values are qualitatively the same with both approaches. This gives further evidence for the usefulness ofDTGOLOGas solver for MDPs.

3.4

Summary

In this section we illuminated the mathematical background for the methods we need throughout this thesis. For one these are solution methods for Markov Decision Processes. We introduced the mathematical model for MDPs and showed several solution methods. In our definition we left out cost models. These can be easily integrated. A whole body of different models exist, derived from the basic model we presented in Section 3.1 like POMDPs or semi-MDPs. There are also other optimization criteria than the expected cumulated reward which is the optimization criterion we showed here, like the average reward. We presented some core solution techniques for finding optimal policies. These methods are called decision-theoretic planning methods. Finally, we showed the relationship between reinforcement learning as a technique to find optimal policies

if the transition model is not known. Decision-theoretic planning will be a topic in Chapter 4. A thorough mathematical treatment of MDPs and the different model variants can be found in (Puterman 1994).

Bayes filter (Section 3.2) are probabilistic methods for solving hidden Markov models. In the context of this thesis we regard Bayes filter for the localization problem of a mobile robot. The state of the system cannot be observed directly. It has to be estimated by observations, i.e. sensor values. We showed the mathematical background of the Kalman filter and a Monte Carlo localization approach, which uses sampling techniques to estimate the pose of the robot. While Kalman filter are Bayes filter with a uni-model Gaussian representation of the belief distribution, Monte Carlo techniques have a multi-modal distribution. Hence, Kalman filter basically cannot represent multiple hypothesis for the pose of the robot. We will use these models throughout Chapter 6.

Finally in Section 3.3, we showed the important mathematical background the situation cal- culus and GOLOG. Our approach with READYLOGis founded on these action formalisms. We introduced the GOLOGderivatives which we integrated into READYLOGin detail. We want to refer to Reiter’s textbook on the situation calculus and GOLOG(Reiter 2001) for further reading.

Chapter 4

A Golog Dialect for Real-time Dynamic

Domains

The aim of designing the language READYLOGis to create a GOLOGdialect which supports the programming of the high-level control of agents or robots in dynamic real-time domains. The primary application has been robotic soccer. The robotic soccer domain has some specific char- acteristics which made the development of READYLOGnecessary and influenced several design decisions: the robotic soccer domain is an unpredictable adversarial dynamic real-time domain. This means that decisions have to be taken quickly and making plans for future courses of actions have a mid-term horizon. Planning ahead for the next minute does not make sense as the world changes unpredictably due to the uncertainty of the outcomes of own actions and the behaviors of the opposing players. The unpredictability of the actions of the agent calls for some notion of un- certainty. Actions succeed with a certain probabilityp or fail with a probability of 1 − p. Further, the soccer domain is a continuous domain, whereas the world in the situation calculus evolves from situation to situation. The agent programming language has to support a continuously changing world. The complexity of the domain also has an effect on the aspect of how to program the agents in practice. The idea of GOLOGis to combine agent programming with planning. Usually this means that a certain goal should be reached and it is proved that with the actions programmed the goal can be reached. As it is generally not obvious to the programmer which series of actions will end in scoring a goal, it seems helpful to use an optimization theory to evaluate different action choices and execute the most promising one w.r.t. the success probability and the optimization theory. Several other aspects influenced the language READYLOG. The knowledge the robot or agent has about its environment is incomplete. This means that the robot has to use its sensors permanently to gather knowledge about the environment. When the robot has to query its sensors frequently it becomes an issue as to how the new knowledge can be integrated fast enough.

Several extensions of the original GOLOGdialect exist which cover specific areas. De Gia- como and Levesque (1999), De Giacomo et al. (2002) proposed an incremental on-line GOLOG interpreter (INDIGOLOG), where actions are directly executed in the real world. Grosskreutz and Lakemeyer (2000b) and Grosskreutz (2000) proposedPGOLOGwhich extends GOLOGwith prob- abilistic programs. With a certain specified probabilityp a program σ succeeds, or fails with probability1 − p. A semantics to deal with continuous change was proposed withCCGOLOG in (Grosskreutz and Lakemeyer 2000a), sensing was presented in (Grosskreutz and Lakemeyer

On-line Sensing Exog. Actions Conc. Exec. Projection Prob. Actions Cont. Change Decision Theory Golog − − − − − − − − ConGolog − − − + − − − − IndiGolog + + + + + − − − ccGolog + + + + + − + − pGolog + + + − + + − − DTGolog − − − − + + − + Readylog + + + + + + + +

Table 4.1: Features of Golog Languages

2001). Boutilier et al. (2000) proposedDTGOLOG, a decision-theoretic GOLOGdialect which uses an MDP based optimization theory to find the optimal course of actions. Table 4.1 gives an overview of the extensions of GOLOG. The achievement of READYLOGis to integrate those aspects into one language and framework. In particular, these are the continuous fluents and the model of concurrency fromCCGOLOG, the projection mechanism fromPGOLOG, and the integra- tion of decision-theory fromDTGOLOG. Several modifications and extensions have been made:

• a novel on-line version of the decision-theoretic planning method proposed by Boutilier et al. (2000), which allows for execution monitoring of policies;

• the introduction of macro actions, so-called options, for decision-theoretic planning after an idea of Precup et al. (1998);

• an enhanced version of passive sensing which allows for update a whole world model in one sweep;

• several speed-ups for policy generation as making use of caching previously computed re- sults in the forward decision-theoretic search for an optimal policy,

• a useful any-time approach for decision-theoretic planning to overcome fixed horizons when searching for a policy and by this to better exploit the computational resources of the agent or robot, and

• a progression method after Lin and Reiter (1997).

The core language is presented in Section 4.1. Here, we give a brief overview of the semantics of the language. We will not go into the details, as most of the material is presented in the literature or in Chapter 3. We give pointers to the original resources. Section 4.2 presents our approach to on-line decision-theoretic planning in detail. We relate our approach to another approach to on-line DT planning by Soutchanski (2001) and show why his approach is not applicable for the domains we are aiming at. In Section 4.3 we show our approach to use macro actions for speeding up the computation of decision-theoretic planning. First, we illustrate the theoretical background of macro actions in the context of Markov Decision Processes, before we present our solution algorithm. Further, we show how some simple, but efficient techniques further speed up the process of policy generation. Section 4.4 addresses implementation details of the interpreter and shows the progression method. We conclude with a discussion in Section 4.5.

Documento similar