This subsection examines techniques that can be used for expansions that make possible a better match between user queries and speech transcripts. Expansion techniques are drawn from text IR, where their benefits derive from their ability to provide lexical enrichment that compensates for semantic underspecification. In SCR, applying expan- sion techniques has the additional benefit of compensating for ASR errors. This subsection reviews techniques for the expansion of both queries and documents.
Query expansion. Expansion of queries can be accomplished with a variety of approaches. In subsection 4.4.3, we discussed the query expansion method introduced by [173, 174], which uses the lan- guage model and pronunciation dictionary to determine possible mis- recognitions of the query word and uses these to expand the query. The method aims at compensating for speech recognition error.
Other query expansion methods have the effect of compensating for both ASR error and semantic underspecification. Here, we return to discuss relevance feedback, an IR technique initially introduced in subsection 2.2. In a standard relevance feedback scenario, users per- form an initial search after which they provide feedback to the system indicating which retrieved items they deem relevant to their informa- tion need. In a pseudo-relevance feedback (PRF) scenario, top ranked items returned by an initial retrieval round are used as feedback, under the assumption that since they matched the query well they must be relevant. This feedback information is then used to modify the system to bias subsequent searches towards the information need.
PRF exploits co-occurrence of words within items to expand queries. Two mechanisms serve to make clear why relevance feedback has an overall tendency to reduce rather than amplify the effects of ASR error. First, recall that the match with a user’s query is dependent on the presence of terms in the ASR transcript of an item. For spoken content items transcribed with low word error rates (WERs), there will be little impact in matching; items with higher WERs will be more significantly affected. We can then expect that items with lower WERs will appear at higher ranks in the results list, which was indeed observed by [237].
Because PRF chooses top-ranked items, it prefers well-recognized items to less well-recognized items. In short, PRF can be expected to have the tendency to disfavor items with worse WERs and the greatest chances of introducing word errors into the query.
Second, recall that the fixed vocabulary of an ASR system means that there are no spelling errors in the ASR transcript and no intro- duction of new words. For this reason, speech recognition transcripts do not contain very rare words. Rare words can be dangerous for rel- evance feedback, because they are highly specific and have large nega- tive impact should they be inappropriately selected to expand a query. In [154], it is demonstrated that PRF can yield a greater percentage improvement in SCR tasks than in text IR tasks.
In work by [136, 137], a range of query expansion techniques making use of language resources (i.e., WordNet1), a collateral corpus and blind relevance feedback are explored. A recent approach to query expansion using a parallel corpus is presented by [189]. This approach uses topics discovered by way of dimensionality reduction in order to enrich user queries.
Document expansion. The process of query expansion extends queries using terms that have a strong statistical association with already-identified relevant items. These terms represent words that the user might have included in the original search request, but, for a vari- ety of reasons, did not. By analogy we can consider the possibility of document expansion. We could use material relevant to a particular document as a source of terms to extend the document, with the goal of improving its representation of the underlying topic. In the case of spo- ken content, we are particularly interested in compensating for words missing from the ASR transcripts due to recognition errors, including OOV errors. This consideration motivated [251] to introduce document expansion for SCR. A selected document is used as a search request to a collection of text documents, and PRF methods are applied to select expansion terms for addition to the document transcript. Since there is no way of knowing whether the word has actually been spoken
or not, the technique strives to add words that were either actually spoken or that the user could have potentially spoken within the con- text of the document. Results on the TREC-7 SDR tasks by AT&T showed promise for this technique [250]. However, it has not been widely explored since this early work. In [192], phone confusion probabilities were used to expand documents, but this technique does not target the benefits of semantic enrichment. In particular, the relative benefits of document expansion for collections containing informal versus planned speech have yet to be investigated thoroughly.
Using collateral information. Collateral information is supple- mental information derived from sources beyond the immediate col- lection of speech media. Collateral information can be used at many different stages in an SCR system. For example, in subsection 3.2.1, we mentioned its usefulness for adaptation of language models. Here, we turn to its usefulness for improving IR and for organizing and enriching speech media for the purpose of presentation to the user.
In comparison to text retrieval collections, speech media collections are typically relatively small. For retrieval, this means that parameters in the IR model could be poorly estimated, particularly with respect to the specificity of terms in individual items. The errors in ASR tran- scripts are likely to introduce further degradation of these estimates. This observation was made quite early in the development of SCR methods. In an attempt to address these problems, supplemental text corpora — much larger document collections, free of ASR errors — were successfully applied as a source of pseudo-relevant documents for PRF [137].
It should be noted that this technique is only effective if the collat- eral text corpora used are properly representative of the domain of the speech media collection. The SDR track at the TREC-7 and TREC-8 workshops used collateral text collections with notable success [127]. The TREC SDR materials were taken from North American radio and television news during a period in 1998. The text data sets that were used to augment retrieval from this collection consisted of text news stories from the same period. For domains that change rapidly over time, it is important that the data is not just in the same topical
space, but that it is from the same time period since important items of the vocabulary and their usage will often be significantly different, even from those in the previous or following year [134].
Another important source of collateral information is closed captions, which are generally more accurate than contemporary ASR systems and can be used to support SCR [103, 206]. In such situations the only reason to perform ASR would be to obtain an exact align- ment between the spoken content and the transcript. Forced alignment techniques are treated in more detail in the following discussion.