UNIDAD CENTRAL DE ADMINISTRACIÓN DE PROGRAMAS (UCAP)
PROGRAMA DE MEJORAMIENTO DEL SISTEMA EDUCATIVO – PROMSE
These case studies were performed with the goal to contrast state-of-the art approaches to dialog modeling, and to identify pitfalls and potential remedies for dialog modeling on robots. Table 4.5 lists the distinctive features and summarizes the results of the case studies. Four state-of-the art approaches were investigated. Ravenclaw and PaMini can be referred to as descriptive approaches, whereas Collagen/Disco and Dipper fall into the category of mental-state approaches. Consequently, only the latter two feature planning or plan recognition capabilities. For the descriptive approaches, a visualization of the dialog flow is feasible and would facilitate dialog design, but this feature is only supported by PaMini. The descriptive approaches keep dialog and task structure well separated. To do so, Ravenclaw employs a domain-independent dialog engine, together with a domain-specific dialog description. PaMini relies on a fine-grained Task State Protocol as interface between dialog and domain level. Collagen/Disco automatically generates the system utterances based on a task model and the current discourse state. In this respect it is similar to PaMini, which combines task states with robot dialog acts. However, PaMini operates at a more abstract level than Collagen/Disco. Also, Collagen/Disco does not allow to configure the dialog structure (except for, with limitations, the exact wording), which PaMini allows through providing a large selection of different interaction patterns. Thus, in the Collagen/Disco framework, the dialog structure emerges directly from the task structure. In general, plan-based approaches inherently tend to keep dialog and domain less separated as they often rely on the same mechanism both for task and dialog planning (though the ones investigated in the case studies do not so).
The dialog configuration is written either in programming languages (such as C++ or Java), or using a domain-specific language (often XML-based) that has been developed for the specific purpose. Here, a trade-off between flexibility and complexity has to be found. The same is true for the back-end specification, which plays a larger role in robotics as it does in traditional domains.
Also, aspects regarding system integration are of particular importance of robotics, where the dialog manager coordinates with the complex robotic system. The question of how to model the interaction with the back-end tends to be solved individually by each framework. Binary success variables, as used in the reasoning-based Collagen/Disco approach seem to be somewhat underspecified for a satisfying information behavior of the robot. The user-defined result frame allows for more freedom but also imposes much knowledge and work on the developer. From this perspective, PaMini’s Task State Protocol appears to be
89
Ravenclaw Collagen/Disco Dipper PaMini
Type of approach Descriptive Plan-based Plan-based Descriptive
Plan recognition, planning No Yes Yes No Visualization No No No Yes Relationship between dialog and task structure
Separated Dialog structure emerges from task structure
Separated Separated
Domain-specific configuration
C++ Macros XML Prolog-like Dipper update language Java and XML Back-end specification Arbitrary components JavaScript Arbitrary components Arbitrary components Discourse planning
Task tree Recipes Information state update rules Locally: Interaction Patterns, globally: Back-end Communication with Back-end User-defined result frame Optional binary success variable Interagent Communication Language Task State Protocol Asynchronous Coordination No;
Latest version: Yes
No Yes (polling-based) Yes (event-based)
Pre-modeled conversational skills Grounding and repair Collaborative plan execution No Patterns for various situations
Grounding model Explicit None None Implicit
Focus shifts Yes Yes With limitations Yes
Multimodality Yes No Yes Yes
Multi-party interaction Yes, n systems:1 user No No Yes, 1 system:n users Table 4.5: Distinctive features of dialog modeling approaches.
a good compromise between both, allowing the developer an easy and standardized yet flexible interaction with the back-end.
In robotics, the discourse planning is affected not only by the user’s utterances but also by the perceptual context. Contrary to the other approaches discussed, PaMini outsources global discourse planning to the back-end, allowing for a less restricted dialog structure. Discourse planning can be executed either by a centralized back-end process (i.e. a planner) or in a distributed way, resulting in a more reactive architecture.
Asynchronous coordination has been identified as crucial for integrating action execution and interaction. Whether a dialog system is able to handle asynchronous action execution depends largely on the middleware used and whether it supports event notifications or not.
Most dialog frameworks provide some kind of pre-modeled conversational skills, in form of generic strategies for grounding and repair, or for collaborative plan execution, or else in form of PaMini’s interaction patterns. Dipper does not provide any pre-modeled dialog strategies. It could, thus, be referred to rather as a toolkit to build dialog frameworks than as a complete dialog framework. Also, focus shifts are supported by most frameworks. Internally, the discourse is typically represented as a stack, with the focused dialog element being on top. With Dipper, which does not maintain such a built-in structure, focus shifts could be implemented only with limitations, at the price of introducing several control flags. Further, the only framework that provides a generic grounding strategy is Ravenclaw. In PaMini, grounding is incorporated implicitly in the structure of the interaction patterns (cf. also section 3.8.2 for further discussion), whereas neither Dipper nor Collagen/Disco support grounding.
In order to keep our case studies simple, we have limited the target scenario to verbal interaction. Nevertheless, nonverbal behaviors and multimodality are crucial aspects in situated dialog. Except for Collagen/Disco, which relies on text in- and output, multimodality could have been realized with all of the discussed dialog managers, as they operate at the semantic level below which the in- and output sources may be exchanged. The new version of Ravenclaw supports multimodal in- and output by providing agents for modality integration and for the production of multimodal output [RE07]. Both Dipper and PaMini rely on a distributed architecture with arbitrary sources for in- and output. PaMini, for instance, provides a collection of available output modalities such as pointing gestures or mimics (depending on the robot platform), that can be combined. However, neither Dipper nor PaMini handles the issues of input fusion and output synchronization. Moreover, human-robot interaction demands more than classical 1:1 interactions. Often, the robot will be situated in environments where multiple possible interaction partners are present, or a robot might even have to collaborate with other robots. Thus, the capability of multi-party interaction is another crucial requirement. PaMini has recently been extended to be able to manage multiple interactions (with multiple participants each), and a multi-party engagement model [BH09] has been integrated in a Multi-Party Quiz game (see section 8.3). Ravenclaw has provisions for the opposite case, in which multiple robots collaborate, forming a team [DHB+06].