Protocolos de señalización y control - TRABAJO ESPECIAL DE GRADO

We found that in the CA interface the users predominantly used speech for providing and correcting values, even though using speech may not be as effective and efficient as using the graphical interaction facilities. The preference for speech may be explained by the fact that people tend to react to the system in the same way as they are addressed (Bilici et al., 2000). The spoken dialogue that is initiated by the system thus biases the users towards

3.5 General discussion

using speech, and as switching to another modality would increase the cognitive load for the user (Boves & Den Os, 1999), they will only do so in case of an obvious advantage in terms of effectiveness or efficiency. A recent study showed that this behavior may change as users get more experienced in using the interface. It was found that many subjects learned to speed up the interaction by providing more information in one utterance, by using the graphical facilities more often and indeed by using the graphical interaction facilities simultaneously with other actions (Sturm et al., 2002). In both systems speech was also preferred for error correction, which is in line with other studies that showed that users tend to stay in the speech mode even if it is not the most effective method (Karat et al., 1999; Suhm et al., 2001). Obviously, more reliable GUI-based error correction methods, such as a soft-keyboard or a menu are indispensable to ensure effective error handling, if only to make sure that users never have to give up fruitless attempts to enter data. This is the more important because re-speaking is a very ineffective way to correct recognition errors.

It has been suggested that extending a spoken dialogue system with a GUI that dis- plays all available options can solve the problem that users find it difficult to form and maintain a good conceptual model of the functionality of a system and the status of an on- going dialogue (Terken & Te Riele, 2001). Our evaluation showed that in combination with pointing input the graphical support that was added to a speech interface in the CA system proved to be confusing rather than helpful. Users did not know how to use all the options offered by the multimodal combination. Moreover, graphical support may not be necessary for users who understand what it means to travel. They do understand what the system is trying to accomplish, and why it needs to do this. The effectiveness figures show that most people indeed managed to get the right information. The data from experiments with a spoken dialogue system for train timetable information suggest that users do not have problems in understanding the form-filling part of the dialogue (Sturm et al., 1999a; Sturm et al. 1999b). Rather, problems with conceptualizing the system’s capabilities and intentions occurred when users tried to negotiate with the system to find a better alternative than the advice that was presented first (which, for practical reasons, was limited to the single connection that was optimal given the selection criteria used in the search). This type of interaction adopts the navigation metaphor that was mentioned in the introduction. Since for this complex task, there is no obvious way for a system to inform users about the available options by means of spoken messages, graphical support may turn out to be very helpful (Ibrahim & Johansson, 2002).For example, negotiating about which alternative connection is optimal is difficult if the alternatives remain implicit (which is usually the case in a spoken dialogue system); displaying alternative connections on a GUI would facilitate the negotiation process.

The dialogues with the CA interface were significantly longer than the interactions with the DM interface. For a large part, this can be attributed to the time taken up by the spoken prompts. In contrast with the DM system in which all information items must be provided one by one, the CA interface allows users to provide more information than was actually prompted for by the system, which makes this interface potentially more efficient. However, several large studies with mixed initiative dialogue systems have shown that few users understand and use the possibility to provide more information than the system prompts for (Sturm et al., 1999a). This is perhaps the best demonstration of the fact that CA systems have no means to inform users about the unadvertised capabilities. As we just noted, this user behavior may change when people become familiar with the system. How- ever, we found that even the occasional dialogues in which all information was provided in one utterance and the Search button was pressed as soon as all information appeared correct on the screen were longer than the shortest dialogue with the DM interface. It seems that in terms of efficiency a CA system will always lag behind a DM system, unless the spoken prompts can be made extremely short (but then the question arises whether short prompts add anything to the quality of the interface). Supporting barge in during spoken prompts may also lead to more efficient dialogues and improve user control, but is only suitable for users who are familiar with the system and do not need the spoken information. Finally, we found that the efficiency and the user satisfaction of both systems were ham- pered by latencies that were caused by the speech recognition engine. Thus, by adding speech input to the DM system one of the most important merits of a DM interface was sacrificed: immediate feedback of actions (Shneiderman, 1983).

In contrast with the DM interface, where the initiative is always with the user, in our CA interface the first initiative is at the system’s side. By doing so, the system determined the pace of the dialogue. Although the users could take the initiative (by providing more information than what was asked for, or by pressing buttons), they may not have under- stood that they could do so or how they could do this. Handling mixed initiative is not triv- ial in a conversational system, since a user may take the initiative for different reasons. If the user takes the initiative to flag and correct an error, the system may take back the initiative as soon as the error has been solved. However, the user may just as well take control of the dialogue to speed up the interaction; in this case, a multimodal system should remain silent. An interface that combines a spoken dialogue system (in which the system must always keep saying things to inform the user that it is still connected) with a GUI (in which the user always has control of the interaction) should be able to recognize the users intentions to keep it predictable and transparent. Unfortunately, most current interactive systems are lacking this is the type of intelligence.

In document TRABAJO ESPECIAL DE GRADO (página 72-81)