5. MEMORIA DE INVESTIGACIÓN
5.1. Programas de Investigación
Results returned by the SCR system are generally displayed as a results list. As a guiding principle, selection interfaces of SCR systems should be designed to let users make maximum use of their innate human ability to quickly ascertain the interest and relevance of particular objects [202, 203]. The importance of informative result display is reflected in, for example, [265], which presents the results of a user study suggesting that multimedia retrieval may be affected if the user has a low perception of the relevance of the retrieved results. The results display should be optimized in order to allow quick and easy assessment of relevance by users.
Fig. 6.3 Excerpt of results list generated by the Dutch Broadcast News Retrieval system of the University of Twente in response to the query “Amsterdam.”
We use an excerpt from a results list generated by the Broadcast News Retrieval system of the University of Twente [211] in Figure 6.3 to illustrate some key issues of results display. The results list has been returned by the system in response to the query “Amsterdam.” The results are displayed as a ranked list of jump-in points each associ- ated with a speech media fragment that the system has matched with the query. Each item is represented by a surrogate comprising a short excerpt of the ASR transcript, a keyframe, the name and date of the program and a timecode locating the result within the program. The function of the surrogate is to give the user the information necessary to evaluate the relevance of the result to the original information need and to decide whether to review the result in more depth. Surrogates are also used in text retrieval, but are particularly important for SCR systems. Reviewing a spoken media result requires listening to an audio file or watching a video file. This process is considerably more time con- suming that skimming a page of text. The benefit of snippets is related to the quality of the underlying speech transcripts — in [108] it is
observed that the accuracy of surrogates determines their usefulness. If subword indexing is used, subword units need to be reconstituted into words before they are appropriate for use in a snippet.
The surrogate is usually “biased” towards the user query, mean- ing that its form is especially chosen to highlight the match between the query and the result. Note that in Figure 6.3, the query word has been highlighted in each snippet. The presence of the query word is strong evidence for the user that the result is relevant to the informa- tion request. Interface design should take into consideration the user’s expectation level of seeing the query word in the snippet and hearing the query word very quickly after the playback of the result is initi- ated. For video applications, presenting a keyframe may provide the user with an additional hint as to the content of the result. A further dimension to the selection of the appropriate form of surrogates is the background knowledge of the user, which has been observed to have an impact on the types of surrogates that are preferred [158, 268].
Additionally, it is important to mention how results selection dis- play may effectively limit the application of advanced IR techniques in practice. In Spoken Content Retrieval beyond ASR Transcripts, IR tech- niques that are capable of overcoming words missing in the transcripts due to speech recognition errors were discussed, in particular, query expansion. However, such techniques may return result items that are relevant to the user’s original information need, but where none of the original query words are actually uttered in the spoken content. Unless there is a mechanism to convince the user of the relevance of a result without showing evidence of the query word being directly associated with the content, users will pass over this result and the system may fail to meet its goal of satisfying the user information need.
One of the challenges of displaying a ranked list of results is to effectively communicate to the user the relationship between the results and the structure of the speech media in the underlying collection. Recall, from Section 5, that multiple levels of units may be used by the SCR system and that the retrieval unit is not necessarily the only important or useful unit of structure within the collection. For example, in Figure 6.3 there is a tension between the retrieval unit (a fragment) and a larger, natural unit in the collection (a news item). Two results
are returned from the same news program on Monday, 12 September 2011. Depending on the application, two results from the same program might confuse a user, who may consider them actually to constitute a single, duplicated result. Even if they contain different spoken content, it is difficult to indicate the difference clearly in the results list because, as illustrated by this example, results display often depends on program level metadata, in this case the date, which is the same for each.
A typical approach is to choose the larger unit with which meta- data is associated as the retrieval unit that is ranked and displayed in the results list. This approach is taken by PodScope (http://www. podscope.com), a spoken-content-based podcast search engine. A results list returned by PodScope in response to the query “search engine” is displayed in Figure 6.4.
Each result is an individual podcast episode displayed together with its metadata. Note that information about the relevance of individual
fragments within the podcast episode is contained in the surrogate. On the left is a scrolling list of jump-in points, displayed as time codes, which allow the user to initiate playback of a particular fragment- level result directly from the episode-level results list. Displaying both episodes and fragments together in the results list gives greater flexi- bility for result review. However, this benefit is offset by the relatively large amount of space required by the surrogate and the fact that time codes provide little information to the user about which fragment would be most interesting to select. Another approach to results display is represented by the search application developed by the University of Twente for the collection at http://www.buchenwald.nl contain- ing interviews with Dutch survivors of the Buchenwald concentration camp [211].
A results list returned in response to the query “bezetting” (Eng. “occupation”) is displayed in Figure 6.5. Each result is an individual
Fig. 6.5 Results of the search application for the Dutch language oral history interview collection at http://www.buchenwald.nl in response to the query “bezetting.”
interview. The user is not presented with the relevant fragments directly, but rather merely supplied with information on how many there are (e.g., three fragments is indicated with “3 fragmenten”). The user enters an interview in order to explore the fragments that it con- tains. Here again, as already mentioned in subsection 5.3.1, segmenta- tion interacts with IR models. In order to rank interviews, it is neces- sary to combine fragment-level relevance to an overall interview-level score. The exact balance to be used when making this computation is important. For example, in some cases a unit containing a single highly relevant fragment should out-rank a unit containing multiple less rele- vant fragments. In other cases, the opposite could be expected to hold. This combination itself should depend on how users use the system.