3 2 Las Programaciones de los otros Cine Clubes
3.3. Las Interpretaciones de los entrevistados sobre los otros cine clubes
This thesis proposes a novel framework for an integrated, prediction-based streaming model, fusing top-down and bottom-up information as well as combining vertical and horizontal analysis to imitate human auditory scene analysis of polyphonic music. This
82
information includes: auditory features, musical features, musical training, attention and expectation. This framework makes use of predictive coding (Section 2.6), where predictions are generated from frequencies present in training data. These predictions will be used to separate symbolic musical input into perceptual streams. Predictions for different types of streaming information will be calculated in modules, which can be combined to produce a single streaming decision, with an identified melody stream and accompaniment stream(s).
This framework will extend the prediction-based approach of the IDyOM model, for three reasons: 1) IDyOM has been behaviourally validated as a cognitive model of musical expectation; 2) IDyOM already employs multiple viewpoints, thus lending itself to a module- based approach; and 3) IDyOM functions in real time, using only past and current information. The basic functioning of the model will be presented in the context of one viewpoint before being discussed in the context of the full range of information required for musical auditory streaming.
4.3.1 Basic framework function
This framework, to be implemented as a predictive model, will perform its analysis in real time following the example of Cambouropoulos’ VISA model, where music is divided into vertical slices in time, where each new event onset begins a new slice. Events that span multiple slices are tracked so that they are linked together throughout the analysis. For each slice, the model breaks these slices up into all possible stream structure combinations, with the restriction that non-adjacent voices cannot be a part of the same stream. For example, in an SATB context, one might perceive all four voices as one perceptual stream, SATB, or each as its own stream, S-A-T-B, or the top voice accompanied by the bottom three, S-ATB, vice versa, SAT-B, and so on, but an ST-AB or STB-A combination are not possible. The most probable continuation, given the preceding context (n-1 slices), is added to the context and the analysis
83
Figure 4.1. Illustration of a predictive module’s process, where predictions are generated for each potential streaming structure continuation. These relationships are complex, therefore predictions will be focused on a single viewpoint per module, for example the interval structure of a slice.
continues throughout the piece until all slices are organized into perceptual streams with corresponding average predictability in the form of information content. Figure 4.1 illustrates this process.
The first slice is treated slightly differently due to a lack of contextual information. Presumably like a human listener waiting to hear a piece of music for the first time, expectations would be biased towards the most common type of streaming structure heard previously. Thus, the first slice will be divided into this most common streaming structure, as determined by the model through training. This initial bias, as well as the evaluation of any output by this model will require data annotated with perceptual streaming information, which can be collected from listeners of various musical backgrounds. Finally, to identify the melody
84
and accompaniment streams, the stream with the highest information content – in other words, the most interesting line – is labelled as melody (Chapter 8).
Outlined above is the overall process of the framework. This iterative analysis process occurs simultaneously in many modules, where each module takes symbolically encoded music as input, here MIDI, kern or text as per IDyOM’s current implementation and outputs information content based on the viewpoint information it models. At each streaming assignment decision, the predictions are linearly combined across modules based on the relative perceptual salience (described in Section 4.4.2) of the viewpoints involved to produce a final
Figure 4.2. An illustration of the proposed streaming model’s work flow. At any point in the analysis, the model will consider the possible streaming structures of the next slice, for example the seven combinations of a four-voice work. Each module will process a feature as described in Section 4.4.1, for example pitch interval. The output of all feature-based modules are simply added together, with each module given a weight designated by the salience module, described in Section 4.4.2. The streaming structure with the lowest combined IC, in other words the most likely continuation, is added to the existing context (see Figure 4.1).
85
stream assignment decision (Figure 4.2). Therefore, it is important that each module is independent and does not rely on the output of another module to create its own.
The remainder of this chapter presents the proposed implementation of all sources of streaming information (Section 4.4) followed by a description of the type of data needed (Section 4.5) and a discussion of potential future research based on the framework (Section 4.6), and its limitations. In Section 4.4.1, acoustic and musical sources of streaming information will be discussed in the context of feature-based modules. In Sections 4.4.2-3, the particular challenges of attention and timbre respectively will be discussed as well as their proposed implementations. Musical training and individual listener’s musical knowledge can be modelled using IDyOM’s long-term memory implementation. This will be discussed in Section 4.4.4. Finally, in Chapter 2, expectation was also identified as a source of streaming information. As it is prediction-based, expectation is inherently included in the framework. Sections 4.7 and 4.8 will compare this framework to a selection of models reviewed in Sections 4.1-2 and proposes ways in which the framework might be used by researchers from differing backgrounds respectively.