The goal of this task was to test whether we were able to implement a variety of different WOZ set-ups covering the main LTC configurations highlighted in Table 5.4. Instances integrating experiments with several component combinations were created, achieving one of the goals
identified earlier as being important for comprehensive WOZ tools support. While it was not our goal to actually run all of these experiments (i.e. we were mainly interested in whether we could build them), this task was crucial to evaluate the range of possible use cases our prototyping platform would be able to support. To do so previous literature was used as a source for realistic product ideas. Elaborating on these ideas we consequently built a total of 8 different WOZ experiments, all of which were using a combination of at least two different language technologies (i.e. ASR, MT, TTS). Both the experiment structures as well as the relevant text utterances were created. Following we describe the products/services for which this WOZ experiments were implemented in some more detail. It needs to be highlighted, however, that while all of them were fully implemented, only MYSPEECH and INTERNETADVISOR were eventually subject to experimentation, employing a setting where a wizard interacts with a number of different test-participants.
DialogueTranslator
DIALOGUETRANSLATORconstitutes a multi-modal translation system whose goal is to trans-
lates a person’s spoken input on the fly from one language into another, allowing for a fluent dialogue. Technologies that are employed in the WOZ experiment set-up comprise Automatic Speech Recognition, Machine Translation as well as Text-to-Speech Synthesis. The tasks of the wizard for this scenario reaches from full interpretation of the input and consequently pro- ducing the output, to correcting the input by either directly changing the result before it is sent on or choosing from an N-best list of possible options. Corrections can be applied either to the recognition results before they are sent to the MT module or to the translation result before it is sent to the receiver.
MySpeech
MYSPEECHis a system that permits people to improve their pronunciation of a foreign lan-
guage. The system records a language learner’s input, analyses the pronunciation of the words and then gives appropriate feedback based on the analysis results. Whereas currently this feedback is mainly presented as a graph, a future version of the system will support a more textual notation. The WOZ method is used to test the user experience for this kind of descrip- tive feedback and to collect a preliminary corpus of possible feedback utterances. To do so a Speech-to-Text WOZ setting is applied in which the wizard interprets the analysis results and gives accompanying textual feedback. An external ASR system and the chat-feature of the prototyping platform are used. The setting is comparable to a chat scenario in which one party speaks and the other one answers via text-chat.
PhotoPal
PHOTOPALis a system that incorporates an Embodied Conversational Agent (ECA) integrated with your photo library. It aims to add a social aspect to the process of sorting and organising
your photos by offering a multi-lingual ‘Companion’ who is interested in the memories and ex- periences you connect with those pictures. A future system will incorporate data from previous conversations as well as data from your online activities that can be linked back to your photos. Based on this information it tries to generate an active and entertaining dialogue. During the development stage WOZ is used to simulate the ‘intelligence’ of the system. The wizard uses a free-text field coupled with canned utterances to interact with a test participant. An animation and Text-to-Speech Synthesis are used to present the wizard’s statements. The MT module is activated in order to support interactions in different languages.
WorldNews
WORLDNEWSis an RSS reader application for smart-phones that translates news into different languages and reads them aloud using Text-to-Speech technology. The system is especially useful in mobile settings where it works like a digital audio player. Simulating this functionality WOZ can be used to obtain early customer feedback. At the beginning the wizard acts as an interpreter, translating news items from one language into another. In a second stage the wizard’s task shifts from translating to post-editing, initially correcting translations and later on choosing the most appropriate ones from an N-best list of options.
MultiTrans
MULTITRANS is an application that automatically transcribes spoken words into written text. In addition it supports cross-language transcription, which offers direct transcription into for- eign languages. That is, a user can speak in one language and the software produces the equiv- alent text in another language. WOZ is used in different stages throughout the development of this application. First the wizard simulates speech recognition, later on machine translation. For both tasks the wizard either produces the entire output or augments results coming from the relevant technology components utilising the correction or N-best list feature.
MultiChat
MULTICHATis a multi-lingual chat program that supports automatic translation of text into a variety of different languages. In addition it can be used to train language understanding by offering a sub-title feature including the relevant synthesised speech output. WOZ helps to obtain early user feedback. A wizard is used to manually correct MT output.
VoiceSearch
VOICESEARCHis a plug-in for web browsers that offers voice-based internet surfing. Modern
speech recognition technology is used to translate speech into a search request. Moreover, the request is automatically translated into different languages in order to increase the amount of possible answers. During the development a human wizard is used to override ASR output
where it is needed, testing the level of recognition quality that is required in order to offer a satisfying user experience.
InternetAdvisor
INTERNETADVISOR is an interactive multi-lingual information terminal, which recommends
appropriate Internet connections bundles to customers in their native language. As such it understands spoken input in different languages and gives back recommendations via text as well as voice output. Testing the scenario a bi-lingual wizard is used to choose appropriate responses from a set of prepared dialogue utterances.
Implementing those scenarios demonstrated that our current WOZ prototyping platform and its employed wizard interface allows for the experimentation with different combinations of technology components. Furthermore we have shown that different platforms reaching from computer terminals to smart phones and tablet devices are supported. Even though only two of the described settings were subject to real experimentation, we believe that the others are similarly realistic and could be explored without demanding additional integration work. Next we continue our analysis of the construction process by describing the results of an evaluation where other researchers were asked to build WOZ experiments with our prototyping platform.