SITUACIÓN DE LA REALIDAD PROBLEMÁTICA
NUMERO DE COMUNIDADES POR DISTRITO JUDICIAL – PUNO
2.2. Planteamiento del problema
2.2.2 Caracterización de la realidad problemática
In rule-based methods, researchers rst analyze the recordings of human interactions and try to semi-automatically nd lawful patterns in multimodal streams. Computational frameworks are then proposed to operationalize those ndings. Such systems usually incorporate set of rules that map perceptual cues to multimodal actions via an intermediate estimation of communicative intentions.
We presents here several examples of rule-based interactive systems.
The BEAT system [CVB01] is quite emblematic of what was developed in the late 90s. It basically augments textual dialog with nonverbal behaviors by enriching the linguistic struc- ture with language tags such as rheme/theme contrasts, objects and actions. The BEAT system extracts linguistic and contextual information from raw input text to control the movements of arms, hands, and face of an avatar as well as the intonation of its voice. A set of rules derived from nonverbal conversational behavior research was used. For example, rules used to control gaze are: "For each THEME: If at beginning of utterance or 70 percents of the time, suggest gazing away from user"; or " For each RHEME: If at end of utterance or 73 percents of the time, suggest gazing towards the user".
Figure 3.3 The three stages of SAIBA and the two mediating languages: FML (function markup language) and BML (behaviour markup language). The gure is reproduced from [Kop+06]
Figure 3.4 An example of a BML block [Kop+06]
A framework for real-time generating multimodal behaviors is SAIBA (Situation Agent Intention Behavior Animation) [Kop+06]. The framework includes three successive stages: (1) planning communicative intent, (2) planning multimodal realization of the intent, and (3) realization of the planned behaviors as shown in Figure 3.3. There are two mediating XML based languages between the stages: Functional Markup Language (FML) that describes intentions and Behavior Markup Language (BML) that describes nonverbal/verbal behaviors and should be realized by an animated agent. Mutimodal behaviors such as speech, gesture, gaze, body movement, head motion are coordinated in a BML block, which consists of rules. Each behavior is split into six phases which is bound by two of seven sync-points: start, ready, stroke-start, stroke-end, relax and end. Behaviors are coordinated by assigning a sync-point
3.1. State of the art: modeling multimodal interactive behaviors 41 of one behavior to a sync-point of another. Figure 3.4 illustrates an example of a BML block. In this example, a speech tab denes a sentence This is an example", which is spoken by a text-to-speech system. Head nodding is aligned with the speech's start sync-point; and arm gesture is trigged by a wb3 event which is a new sync-point dened in the speech tab. Based on the SAIBA, Lee and Marsella [LM06] built a Nonverbal Behavior Generator system to generate behaviors according to communicative functions. The system generates nonverbal behaviors such as head movements, facial expressions and body gesture by analyzing syntactic and semantic structure of input text. Particularly, the nonverbal behaviors are assigned with some specic words, phrase or speech acts by rules derived from analyzing a number of video clips.
Thorisson [Thó02] proposed an event-based language where a nite state machine (FSM) describes an interaction scenario as a series of states with pre-conditions and post-actions struc- tured in three hierarchical layers (reactive, process and content). They built a dialogue model, namely Ymir, which was used to drive a virtual agent named Gandalf in task-oriented dia- logues. The architecture includes several modules: perception, decision, as well as knowledge and action scheduler. The perception modules include two types: (1) Unimodal Perceptors that detect important events of single modalities such as prosodic, speech, positional, direc- tional and then (2) Multimodal Integrators that collect all the information from the unimodal ones to come up with a more comprehensive description of user's behavior. The perceptual modules receive and prepare input data to be used as the basis for decisions to act. The knowledge base of the system contains any knowledge that have to do with dialog such as par- ticipants, their body parts, etc. The decision modules decide to read mental and world states from perceptive modules and other information from knowledge base module and decide what will be acted. Most of the perceptual and decision modules produce Boolean output (on/o) with the intent to help building larger systems. The decision modules are based on rules and send behavior requests to action modules when preconditions are satised and following top-down priority levels: reactive layer (highest), process control layer and content layer. The action modules will manage the behavior requests and execute the behaviors following an any-time algorithm [Dea87] (managing life-span, when activating, deactivating action, etc.).
As another example of rule-based model, Kanda et al [Kan+02] built a tool named Episode Editor. The tool is used to drive behaviors of a humanoid robot (a Robovie robot) by building situated modules which are orchestrated by episode rules. A situated module realizes an actionreaction pair as an interactive and reactive behavior between human and robot in a particular situation. Each situated module (shown in Figure 3.5.a) includes three parts: pre- condition, indication and recognition, which is used to perform certain interactive behaviors such as shaking hands, greeting people, guiding visitors, etc. The pre-condition part veries if the situated module can be executed or not. Then, if the pre-condition is satised, the indication part will generate the robot's action (utterance/ gestures), for example, "lets wave right hand to greet people". After that, the recognition part checks the expected human reaction with regards to the robot's actions generated by the indication so that a human pilot can trigger the most suitable action of the robot. The situated module could be executed consecutively and controlled by episode rules to establish a sequence of situated modules (robot's behaviors) shown in Figure 3.5 (b). One disadvantage of situated modules is their
(a) Situated module
(b) Sequence of situated module controlled by episode rules Figure 3.5 Situated modules controlled by episode rules [Kan+02]
limited ability to perform multiple tasks at the same time as well as monitoring complex sequences.
While being quite ecient and easy to deploy for specic interactive tasks, hand-crafted rules have diculty in taking into account the many factors conditioning the multimodal behaviors (task, personality, social context, emotion, gender, etc.) while maintaining a ne- grained life-like variability.
Another popular approach is based on machine learning techniques which try to nd behavior regularities and possibly some of its variability directly from data.
3.2. Recurrent Neural Network - Long-Short Term Memory 43