8. Organización social
8.3. La posibilidad de una quesería cooperativa
The Open Video Project also conducted a number of studies to investigate the individual roles of textual and non-textual video surrogates in making relevance judgment or identifying the contents of a video. Wildemuth et al.(2002) evaluated five video surrogates – storyboards with text keywords, storyboards with audio key- words, slide shows with text, slide shows with audio keywords, and fast forward – in relation to their usefulness and usability in accomplishing specific tasks, i.e., gist determination, object recognition, action recognition, and visual gist determi- nation. These performance tasks were closely related to the real-world tasks that
users expect to perform with video collections. Participants were also asked to pro- vide comments about the strengths and weaknesses of each surrogate after viewing. Specifically, the keyframes in the storyboard were displayed for a limited amount of time, allowing 500 milliseconds per frame, with either text keywords displayed under the storyboard, or with audio recording of text keywords played and repeated as necessary for the duration of the visual display during the viewing. The slide shows incorporated the same set of key frames as were included in the storyboards, and each frame was displayed for 250 milliseconds. To make the slide shows take the same amount of time as the storyboards, the entire sets of key frames was played twice for the slide shows, with no pause between the two repetitions. Finally, the fast forwards, were playing the every Nth frame of the original video at normal frame rate, so that the fast forwards were N times as fast as the original video, whereN was chosen so that the fast forward ran about the same amount of time as the other four surrogates. No audio or text augmented the fast forwards.
According to the results, no surrogate was universally judged “best” by the participants, but the slide show with text keywords was not preferred by anyone, and the fast forward surrogate garnered the most support (i.e., when participants were asked to choose surrogates with which to perform the tasks, the fast forward was chosen in 14 out of 30 trials), particularly from experienced video users. Also note that in this study, storyboards or slide shows with audio keywords were multi-modal surrogates that employed both visual and audio modalities, and the fast forwards were played at about 8x the original speed, which was actually “slow” comparing to the speeds used in the later studies (Wildemuth et al., 2003). User preference also suggested that the fast forward surrogate should be further developed with the addition of audio keywords. In a more recent study, Marchionini et al.(2009) actually developed fast forwards with audio keywords, but the fast forwards were
more than 100x, a lot faster than the ones used inWildemuth et al.(2002), and did not turn out to be very effective surrogates compared to other surrogates used in Marchionini et al. (2009).
Though the viewing compaction rates used in these surrogates supported ade- quate performance, participants commented that they desired having more control over surrogate speed and sequencing, and they would like to be able to move from surrogate to surrogate. In response to the need for flexibility, Marchionini et al. (2000) developed the AgileViews user interface framework with several different views of a collection, as well as control mechanisms that facilitate low-effort actions and strategies for coordinating the views. Amir et al.(2003) developed an efficient video browser with multiple synchronized views of storyboards, salient animations, slide shows with audio, and full videos, allowing users switch between different views, while preserving the corresponding point within the video among all views. The par- ticipants in the study used the keywords to understand the content of the video, as advance organizers for viewing the visual portion of the surrogate, and as a source of ideas for terms to use in future searches. They commented that textual video surrogates can facilitate the process of determining relevance, and non-textual video surrogates can effectively complement textual surrogates. The study also found that both user perceptions and performance could be affected by characteristics of the test video itself. To take care of the effects of the test video characteristics, the re- cent Open Video usability studies have adopted a set of comparable videos selected from the NASA Connect and NASA Destination Tomorrow collections (Song and Marchionini, 2007;Marchionini et al., 2009).
Hughes et al. (2003) reported an eye-tracking study of digital video surrogates composed of text and three thumbnail images to represent each document. Twelve undergraduate students selected relevant video records from results lists contain-
ing titles, descriptions, and three keyframes for ten different search tasks. As they browsed the results page for each search, their eye movements were tracked to de- termine where, when, and how long they looked at text and image surrogates. It was found that participants looked at and fixated on text statistically reliably more than on images. The text surrogates were used as an anchor point from which the participants made judgments about the search results, and the images were commu- nicating the “feel of the film” and what the video was like and were consistently used to confirm the judgments participants made. Moreover, although text dominates how people make sense of retrieval sets, images add confirmatory value and people like to have them.
Wildemuth et al. (2003) reported on a study of the use of fast forwards for digital video, and recommended a fast forward default speed of 1:64 of the original video with adequate user performance and satisfaction. Although this approach can achieve a much higher compaction/compression rate than fast forwards with audio, yet it still leads to severe coherence degradation and discomfort to the viewer.
Yang et al. (2003) addressed the question what measures could or should be used to test how people perceive and understand video surrogates, and overviewed six user performance measures which were used in two usability studies (Wildemuth et al.,2002, 2003). The six performance measures fall into two categories: Recogni- tion tasks(including objection recognition with text stimuli, object recognition with graphical stimuli, and action recognition) and Inference tasks (including free-text gist determination, multiple choice gist determination, and visual gist determina- tion). These measures may be useful in evaluating different surrogates in relation to their effectiveness in aiding video retrieval.
The tasks were motivated by the two-level categorization of video comprehension – sensory seeing and cognitive seeing – as discussed in Section 2.1.1. The recogni-
tion measures depend on pre-iconographical analysis of the objects and examine whether users remember seeing or hearing particular words, frames or video clips in the surrogates. The inference measures depend on iconographical analysis and iconographical interpretation of the video surrogates, and test how much thematic information users could obtain from the video surrogates and what “story” about the original video users could construct based on the surrogates. The initial field testing of these six measures indicates that they are practical and can differentiate multiple levels of performance with video surrogates (Wildemuth et al.,2002,2003). Marchionini (2006) presented a theoretical discussion of several measures of hu- man performance that have been used in developing visual surrogates for the Open Video Digital Library. Two sets ofcognitive performance measures (i.e., recognition measures and inference measures, as discussed in Yang et al. (2003)) and one set ofattitudinal measures were described. The cognitive performance measures aim to assess object and action recognition as well as inferences made from gists. The at- titudinal measures include a set of twelve Likert-scaled statements (Davis,1989) to assess usability, usefulness (e.g., This system makes it easier to find information) and learnability (e.g., learning to operate this system was easy for me), and seven-point semantic differential scales adopted from Ghani et al. (1991) to assess engagement (e.g., I felt: absorbed intenselynot absorbed intensely) and enjoyment (e.g., using the system was: interestinguninteresting). These measures have been adopted by a number of later studies (Song and Marchionini,2007; Marchionini et al.,2009).
Biometric measures can also be investigated as adjuncts to the cognitive mea- sures so that we will have sets of measures for all three classes of human measures: physical, cognitive, and affective (Marchionini, 2006). These measures address dif- ferent aspects of the search process and human interaction with retrieval systems, and none of them are dispensable to understand the overall effects of video retrieval
and sense-making episodes.