The present experiment tested the effects of the corpus on the perceived correspondence of six audio-visual feature associations (i.e. 6x3). The primary purpose of the six A/V associations tested in the present experiments is to enable visual interaction with corpus based concatenative synthesis for creative applications. In an overview of the data gathered in this experiment it is worth noting that overall there are consistencies in the subjects’ responses across the groups and conditions. The fact that there are consistency between the two subject groups (i.e. sound practitioner and non-sound practitioner) suggests that musical/sound training was not a significant factor; this is discussed in more detail later in this section. The fact that there are consistencies in the subjects’ responses between the two experimental conditions (controlled environment and online) suggests that both approaches for gathering data are equally well suited for this experiment. Moreover, consistencies could also be observed in the participant responses between the first and the second question of the experiment (i.e. pairwise similarity judgment and rating). This consistency suggests that both data gathering methods are appropriate for measuring perceived similarity of cross-modal stimuli and could be used interchangeably.
In agreement with the first hypothesis, the experimental results from the first study revealed differences in the degree of perceived correspondence reported by the subjects between the individual A/V feature associations that were tested. The present study confirms the results revealed by previous studies (Eitan & Timmers, 2010; Evans & Treisman, 2010; Kostas Giannakis & Smith, 1993; Kohn & Eitan, 2009; Kussner & Leech-Wilkinson, 2013; Küssner, 2014; Lipscomb & Kim, 2004; Marks, 1989; Rusconi et al., 2006; R. Walker, 1987) which found strong relationships between the audio-visual feature associations of size – loudness, vertical position- pitch, color brightness– spectral brightness. Weaker were the relationships between texture granularity – sound dissonance and color complexity- sound dissonance, similar to the findings of (Giannakis, 2006). The weak correspondence reported by the subjects between these features of the auditory and the visual stimuli could be interpreted in two ways. Either the features that were tested (i.e. between texture granularity – sound dissonance and color variance- sound dissonance) are not a good match, or the synthesis parameter that was used to map the distance between visual texture granularity and sound dissonance (i.e. transposition randomness, selection randomness) are not the most appropriate parameters. Moreover, it is worth noting that auditory and visual textures are more difficult to define in computational and statistical terms due to the fact that both auditory and visual texture are higher dimensional features that consist of multiple lower level auditory and visual parameters. This is true, particularly if we compare auditory and visual textures to other auditory and visual features such as auditory pitch, loudness, brightness and visual size, position and brightness. Further research will be required to investigate which set
103
of parameters are the most appropriate for mapping the distances between texture granularity – sound dissonance and color complexity- sound dissonance.
Contrary to the second hypothesis, the results showed that subjects' responses did not vary significantly as a result of the harmonicity of the source audio which was used to synthesise the stimuli. These findings suggest that the strength of the perceived correspondence between the A/V associations prevails over the timbre characteristics of the sounds used to render the complementary polar features. Hence, the empirical evidence gathered by previous research is generalizable/ applicable to different contexts and further the overall dimensionality of the sound used to render should not have a very significant effect on the comprehensibility and usability of an A/V mapping. An interesting trend can be observed in the interaction between the factors A/V associations and Corpus. The data show that the interactions between these two factors were greater in the case of the A/V association where perceived correspondence was weak (i.e. texture granularity– sound dissonance and color complexity- sound dissonance) than A/V association where perceived correspondence was strong (i.e. Size-Loudness, Vertical position-Pitch, Color and spectral brightness). An interpretation of the difference in the strength of the interactions between A/V associations and the harmonicity of the corpus, is that in the case of the strongly correlated A/V associations, the strength of the correspondence prevails to less important features of the harmonicity of the audio corpus. The strength of the A/V association dominates the subjects’ judgement. While, in the case of weakly correlated A/V associations, the less important features of harmonicity become more influential in the subjects’ similarity judgment. This would lead to the conclusion that the influence of the harmonicity of the audio corpus when making a similarity judgement is relative to the strength/dominance of the A/V association being tested.
Another interpretation related to the variation of the interaction between the A/V associations and Corpus could be based on the observation that for the weakly correlated feature dimensions the most highly rated corpus was the impacts corpus followed by string corpus, while the lowest rated was the wind. This could be attributed to the fact that the texture granularity– sound dissonance and color complexity- sound dissonance A/V associations are both related to a transition from consonance to dissonance that rely on textural/timbral feature of the sound. Hence, the wind corpus being the least harmonic corpus in comparison to the strings and the impacts corpora could render less well the transition from consonance to dissonance, Figure 54. The string corpus, although it was the most harmonic, was rated consistently lower than the impacts. This observation could be attributed to the fact that the string corpus was more homogeneous than the impact corpus in terms of periodicity and spectral flatness and brightness. If this is true then it could be argued that both the heterogeneity of the corpus as well as the harmonicity of the audio are important for rendering the transition from audiovisual consonance to dissonance. However
104
String Impacts Wind
Figure 54. Shows that the wind corpus being the least harmonic corpus in comparison to the string and impacts corpora could render less well transitions between consonance and dissonance.
further research will be necessary to support this claim.
The fact that there was no significant differences between the expert and the non-expert group suggests that the cross-modal correspondences tested in this study are not dependent on the level of music/sound training of the subjects, which is in agreement with the third hypothesis. This finding are in agreement with (Lipscomb & Kim, 2004) findings and oppose the findings of (Kussner & Leech-Wilkinson, 2013; Küssner, 2014; R. Walker, 1987). The fact that musical/sound training was not a very significant factor suggests that the correspondences might be underpinned by either psychophysical, structural similarity between audio-visual feature dimensions and/or other cultural conventions, but not specifically to conventions related to the acquisition of musical/auditory skills. An interesting trend in the data is that the expert subjects appear to perceive a stronger correspondence between vertical position-pitch than color brightness-spectral brightness, while non-expert subjects appear to perceive a stronger correspondence between color brightness-spectral brightness than vertical position-pitch. In my interpretation, the expert subjects are more accustomed to the association vertical position- pitch as this association is very widely used in music software and digital audio workstation (e.g. sequencer) as discussed in Chapter 2.
105