Unidad 1.- LA EVALUACIÓN DEL IMPACTO DE GÉNERO. CONTEXTO
1.2. El compromiso de las políticas públicas y acción administrativa en materia de
The Extensible MultiModal Annotation markup language (EMMA) is an XML-based W3C standard for describing the interpretation of multimodal user input (Johnston, 2009). It is in the state of a W3C Recommendation with the latest version published2 in February 2009 (Johnston et al., 2009).
The general idea of EMMA is the representation of information that has been auto- matically extracted from user input by interpreters. Especially the message history of the different stages during input processing from raw-signal, over input interpretation to multimodal integration can be logged.
For the representation of an interpretation, no fix specification is defined. Moreover the standard allows one to integrate application specific markups into therefore assigned containers, giving the required degree of freedom for the content. Further concepts allow one to group mutually exclusive, sequential, or in some manner related inputs. Interpretation instances that are derived from other interpretation instances during the process of the increasing representation refinement from raw data to interpretation can be linked.
A set of annotation attributes and elements allows one to provide additional meta- data about the input event. This includes information about media-type, signal-format, recognition and interpretation confidence, input source, timestamps, medium, mode and others. For multimodal integration it is possible to define hooks which serve as place- holders for content from other input instances. Furthermore, the language allows the extension with non-standardised input annotation with custom vendor or application specific information.
Examples
The EMMA document depicted in Figure 3.14 shows a markup that could have been produced by a speech recognition and natural language understanding component in
<emma:emma version=” 1 . 0 ” xmlns:emma=” h t t p : //www. w3 . o r g / 2 0 0 3 / 0 4 /emma” . . . > <emma:one−o f i d=” r 1 ” emma:medium=” a c o u s t i c ” emma:mode=” v o i c e ” e m m a : f u n c t i o n=” d i a l o g ” emma:verbal=” t r u e ” e m m a : s t a r t=” 1 2 4 1 0 3 5 8 8 6 2 4 6 ” emma:end=” 1 2 4 1 0 3 5 8 8 9 3 0 6 ” emma:source=” smm:platform=iPhone −2.2.1 −5H1 1 ” e m m a : s i g n a l=” s m m : f i l e=a u d i o −416120. amr” e m m a : s i g n a l −s i z e=” 4902 ”
e m m a : p r o c e s s=” smm:type=a s r&v e r s i o n=a s r e n g 2 . 4 ” emma:media−t y p e=” a u d i o /amr ; r a t e =8000 ”
emma:lang=” en−US” emma:grammar−r e f=” gram1 ” emma:model−r e f=” model1 ”> <e m m a : i n t e r p r e t a t i o n i d=” i n t 1 ” e m m a : c o n f i d e n c e=” 0 . 7 5 ” emma:tokens=” f l i g h t s from b o s t o n t o d e n v e r ”> < f l t >< o r i g>Boston</ o r i g> <d e s t>Denver</ d e s t></ f l t > </ e m m a : i n t e r p r e t a t i o n> <e m m a : i n t e r p r e t a t i o n i d=” i n t 2 ” e m m a : c o n f i d e n c e=” 0 . 6 8 ” emma:tokens=” f l i g h t s from a u s t i n t o d e n v e r ”> < f l t >< o r i g>A u s t i n</ o r i g> <d e s t>Denver</ d e s t></ f l t > </ e m m a : i n t e r p r e t a t i o n> </ emma:one−o f> <e m m a : i n f o> < s e s s i o n>E50DAE19−79B5−44BA−892D</ s e s s i o n> </ e m m a : i n f o>
<emma:grammar i d=” gram1 ” r e f=”smm:grammar= f l i g h t s ” /> <emma:model i d=” model1 ” r e f=” s m m : f i l e= f l i g h t s . xsd ” /> </emma:emma>
3.3 Representing Multimodal Interaction 63
a flight booking system. On the first level of the root element (emma:emma) it con- tains a container (emma:one-of) element that holds two alternative interpretations (emma:interpretation) for a speech input. One attribute of an interpretation is the emma:confidence that reflects the interpreter’s confidence with the recognition result. A second attribute is a list of emma:tokens, here the term token is used in the com- putational and syntactic sense of units of input. The syntax for these tokens is not predefined, in the example it is the list of words and phrases that have been recognised by the speech recogniser. The element emma:interpretation itself is a container for a single interpretation represented in an arbitrary application specific markup, in the example the flight information with origin and destination. The data model behind this semantic representation is directly supplied with the attribute emma:model-ref that refers to the model defined in the emma:model tag at the end of the listing.
Further annotations are given as attributes within the element emma:one-of which are valid for all emma:interpretation elements it contains. They comprise annotations for modality classification (emma:medium, emma:mode), information about the raw input source (emma:source, emma:signal, emma:signal-size, emma:process, emma:media- type), the consulted grammar (emma:lang, emma:grammar ref) and timestamps for the start and stop of the input (emma:start, emma:stop). Furthermore, it is possible to clas- sify the inputs with respect to their communicative function (emma:function). EMMA also provides an extensibility mechanism for the annotation of user inputs with ven- dor or application specific metadata that is not covered by the standard. For this, the container emma:info is used. In the example the container introduces vendor specific session information.
If an interpretation instance is derived from another instance, the derivation annota- tion can be used to log the derivation process. Figure 3.15 shows the result of an interpretation process that goes from raw data to increasingly refined input representa- tion. Here, interpretations from earlier stages of input processing are collected in the emma:derivation element. The first interpretation with the ID raw contains the re- sult from the speech recogniser which is just the utterance of the speech input (“from Boston to Denver tomorrow”). The next interpretation with the ID better is result of the first interpretation step and describes flight information with the origin and des- tination of the flight and the unresolved specification “tomorrow” for the date. The emma:derived-from indicates that this interpretation is derived from the first interpre- tation with the ID raw. In the final interpretation with the ID best, the date is also resolved and the correct date for the flight is added. Here, the emma:derived-from indicates the direct derivation from the previous interpretation with the ID better.