• No se han encontrado resultados

1.3. Aportaciones de valor de la alta velocidad

1.3.1. El atractivo comercial de la alta velocidad

To compare the model to other approaches for tune family classification, we performed an experiment with the Dutch folk song dataset provided by van Kranenberg et al. [174]. The collection consists of several annotated subsets [175], including the MTC-ANN dataset containing 360 Dutch folk songs classified into 26 tune families. All songs are monophonic.

On this dataset, Van Kranenburg et al. reported excellent results for a retrieval task based on nearest neighbours. Each tune is first characterized by a feature vector of 88 global features gathered from several sources. 51 features were collected by Steinbeck and Jesser [176] by exploring the Essen Folk Song Collection, which contains 10,000 folk songs from Germanic regions and China. 37 features were collected by McKay [177] and were collected for general genre and song analysis. The features incorporate hand-crafted measures, which are based on statistical observations of note occurrences, combinations with fine-tuned threshold and specific combination of spectral features. The nearest neighbour classification is used. Each song is classified into the tune family of a song which is the closest in the vector feature space. The comparison of vectors is performed by using the cosine distance.

We used the same retrieval setup in this experiment as in the previous Slovene songs dataset experiment; however, the feature vectors were obtained as follows. We trained one SymCHM model on all 360 songs of the MTC-ANN dataset and obtained a model with 3750 parts across layers 3–7. The model was then inferred on each song and its out- put encoded into a feature vector where each part was mapped onto one vector element whose value represented the sum of the part’s activations. The vector values were then adjusted as described in Van Kranenburg et al. [178]. For each element, the values were standardized across the dataset to have zero mean and a standard deviation of 1. As with Van Kranenberg et al. [178], the cosine distance was used for comparison of vectors.

Our model reached 74.4 % classification accuracy on the dataset. The confusion mat- rix is depicted in Figure5.10. The results are about 20 percent lower when compared to [178]. However, we believe the results are interesting, considering the fact that a pat- tern discovery model, relying only on onset-pitch notation, was used for this task. The model was not specifically trained or parameter-tuned for this task and was applied to the dataset without any dataset-specific adjustment. The model provided compositions of relatively-encoded melodic patterns learned in an unsupervised manner. Other ap-

Figure 5.10

The confusion matrix of tune family classific- ation with SymCHM. The reference annota- tions are represented in rows (left) and the predicted classes in columns (bottom).

proaches applied to the MTC-ANN dataset used additional spectral features, e.g. Van Kranenburg [178], and symbolic features, e.g. Walshaw [136] who used bar indicators. In contrast, no know-how about the dataset or folk and western music in general was used in the procedure. Of course, inclusion of such knowledge could also be beneficial and will be explored in our future work.

6

Compositional Hierarchical

Model for Rhythm Modeling

In this chapter, we present how the compositional hierarchical model can be used for modeling rhythm. We thus focus on the temporal aspects of music and ignore the har- monic and melodic aspects that were discussed in the previous two chapters. Our motiv- ation stems from the fact that some of the model’s features are intuitively applicable to rhythm. For example, the relative encoding of time in rhythmic structures is commonly used in rhythmic representations. In live music, such relative encoding comes natural— a rhythmic pattern may vary in duration due to tempo changes, yet it retains its inner structure. When studying rhythm in music corpora, rhythmic patterns occur in differ- ent tempi across pieces, so their relative encoding is necessary if they are to be studied. In addition, the model’s biologically-inspired mechanisms aid in handling the variabil- ity of rhythmic patterns, which commonly occur in the transitions between segments (e.g. drum transitions) and in segment repetitions (e.g. half-feel and double-feel).

This chapter summarizes our latest work. The results represent the work in progress, leaving several aspects of the model’s development for future work. Nevertheless, we show how the model can be used for modeling rhythm and demonstrate its abilities through several examples, which are connected with the rhythm-related tasks, such as tempo estimation, rhythmic classification and beat tracking.

6.1

Model Description

Input of the rhythmic model consists of the onset times and the magnitudes of music events. These may be extracted either from audio recordings (with an onset detector) or from symbolic representations. The input thus contains onset times and their mag- nitudes. In contrast to the SymCHM, pitch information is ignored:

ℐ ∶ {X ∶ X = [N𝑜, 0, N𝑚]}. (6.1)

As in the previous implementations, the first layer ℒ0consists of a single atomic part P0

1. Since any rhythm is composed of at least two events (i.e. a single event cannot by itself represent rhythm), P0

1activates for all the pairs of input events 𝑖1= [N1𝑜, 0, N1𝑚] and 𝑖2= [N2𝑜, 0, N𝑚2], where 𝑖1occurs before 𝑖2, as:

The onset time ATis defined by the onset time of the first event, the magnitude AMis the average magnitude of both events. The role of activation location ALis different in this model, as it represents the scale of activation on the time axis. On the first layer, AL is defined as the difference of onset times of both events (the difference in their length). Namely, as each rhythmic pattern in our model is relatively encoded, the activation scale represents the timing (speed) with which it has been located in the model’s input. Scale will distinguish between two pattern occurrences found at the same onset, one faster (small scale), and one slower (large scale).