Tale casistica si ritrova anche negli altri paesi presi in esame dal progetto di ricerca, si veda:

NOTE DELL'AUTORE

59. Tale casistica si ritrova anche negli altri paesi presi in esame dal progetto di ricerca, si veda:

information change, the greater the need to signal this with an edit term, which also allows time for the re-computation required for the new material in the repair.

4.7.4 Limitations and future work

One of the limitations here is not using the audio data due to time constraints. This will clearly influence how transcribers and annotators interpret the repairs. More can be done with syntactic context, and using a parser with connected tree structures partially available, such as a PCFG parser (Roark et al., 2009) or a Dynamic Syntax parser (Purver et al., 2011) could be a next step. A more thorough experimental evaluation of annotation using different variables could inves- tigate how annotators disagree and agree on repair form more comprehensively. Clearly there is gradience in the interpretation, but investigating precisely how and why this happens would give us an understanding of how people interpret repairs in dialogue.

4.8 Conclusion: Consequences for models of self-repair

Given the results of this study, I conclude that only using ‘rough copy’ dependency between reparandum and repair phases is a restricted view of repair detection and interpretation: an incremental information processing view of these processes is more realistic, and may lead to better results in automatic repair detection, as the next chapter investigates. While regularity in surface form exists, for instance with short repeats being the most common, adherence to alignment for a model of repair is generally inadequate for the long tail of substitutions.

Edit terms prove to be less predictive of a repair onset than hoped, as they are more often forward-looking disfluencies as stand alone edit terms. This is surprising given their prevalence in the literature on repair. Edit terms can be seen as conventionalised signals of trouble in com- munication, but their default meaning appears not to be one to prime for a repair, rather just a signal that the speaker is uncertain how to proceed. Other features, such as the fluency level and information content of the utterance so far, must therefore be relied on to detect repair onsets by hearers and machines.

Furthermore, the annotation work suggests self-repair classification may have been avoided for understandable reasons– there is a tendency for inter-annotator disagreement and it is not clear a purely categorical way of classifying repairs (other than verbatim repeats) is a good way forward. There are a number of different functions that repairs perform, and any NLU or NLG

system capable of understanding and generating repairs in a realistic way must account for them. There is extra meaning computed on-line caused by repairs, both semantic and pragmatic, which is not available from simply removing the reparandum from the input string and re-parsing or re-generating the phrase. A context-sensitive approach to building these systems is clearly better than one which removes parts of the utterance before processing cleaned utterances.

138

Chapter 5 Strongly Incremental Self-Repair Detection

This chapter1presents STIR (STrongly Incremental Repair detection), a system that detects self- repairs and edit terms on transcripts incrementally with minimal latency, addressing problems from the previous approaches outlined in Chapter 3 and cognitive processing insights from em- pirical evidence in Chapters 2 and 4. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the different stages of repairs. The measures can be used to model self-repair detection in terms of time-linear incremental information processing. Detection results on the Switchboard disfluency tagged corpus show utterance-final accuracy which improves on state-of-the-art n-gram model based incremental repair detection, and has considerably better incremental accuracy, faster time-to-detection and less computational overhead. STIR’s performance is evaluated using incremental metrics and novel repair processing evaluation standards are proposed.

5.1 Introduction

To re-introduce the task at hand for automatic systems, I reprise (4.1) below as the structure a repair detector should be capable of recognizing:

John [ likes | {z } reparandum + {uh} | {z } interregnum loves ] | {z } repair Mary (5.1)

As discussed in previous chapters, from a dialogue systems perspective, it is not only the detection of repair presence but also appropriate assignment of the entire structure that is vital for robust natural language understanding (NLU). Downgrading the commitment of reparandum phases and assigning appropriate interregnum and repair phases permits computation of the user’s intended meaning. As discussed above, for implementation into incremental dialogue systems (see e.g. Rieser and Schlangen, 2011, Section 3.3), left-to-right operability on its own is not suffi- cient and repair detection should operate without unnecessary processing overhead, and function efficiently within an incremental framework, meeting as many of the incremental criteria set out in Section 3.5.3 as possible.

In line with the principle of strong incremental interpretation (Milward, 1991), a repair de- tector should give the best results possible as early as possible. As discussed in 3.5, with one exception (Zwarts et al., 2010), there has been no focus on evaluating or improving the incre- mental performance of repair detection.

In this chapter I present STIR (STrongly Incremental Repair detection), a system which ad- dresses the challenges of incremental accuracy, structure assignment, computational complex- ity and latency in self-repair detection, by making local decisions based on relatively simple information-theoretic measures of fluency and similarity. Section 5.2 summarizes the challenges posed and explains the general approach; Section 5.3 explains STIR in detail; Section 5.4 explains the experimental set-up and introduces evaluation metrics, some of which are novel; Sec- tion 5.5 presents and discusses STIR’s results on Switchboard; Section 5.6 investigates STIR’s domain-generality and practical use in psychiatry applications and Section 5.7 concludes.

In document Diacronie Studi di Storia Contemporanea (página 97-100)