• No se han encontrado resultados

3. TEMAS Y SUBTEMAS

3.5 Requisitos para progresar en el Sistema Nacional de Rehabilitación

3.5.3 Requisitos administrativos

The fundamental task in this experiment was to represent the timbral qualities of the four guitars in a quantitative form in order to classify given test tones from a set of reference tones. Using MDS trajectory paths as a representation of the timbre of each tone we achieved results which exceeded our expectations. The correct classification rate was 97.5% with FFT and 95.1% with CQT. It appears from these results that the trajectory paths in frequency space offer a very good representation of the timbre or characteristic ‘signature’ of each instrument across its pitch range. From examination of the PC loadings, it appears that features enabling classification are the shape of the power envelope, spectral features related to the spectral centroid and the body resonances present in the attack portion of guitar tones. To further improve the process, a number of variations were tried. Secondary goals were to attempt to uncover some secrets of guitar timbre.

To compare the performance of FFT and CQT as techniques for time-frequency transform, some variations in the number of principle components were considered. We found that, when the number of PC’s was reduced to three, the performance of the FFT data reduced from 97.5% to 93.1% whereas the performance of the CQT data improved from 95.1% to 96.8%. One possible explanation for this could be that the CQT data is stored in a more compact way due to the logarithmic nature of the frequency scaling. This means there is more detailed information about the lower harmonics and less about the higher harmonics which are bunched together with CQT. Overall, we can say that the results with FFT were comparable to those with CQT.

In a variation to the standard use of principal components (weighted according to variance), the Mahalanobis distance was used where all PC’s are given equal weighting. In this case, as we add more PC’s, we add more information but of an increasingly unreliable nature and all of equal weighting. Our test involved beginning with one PC as a basis for classification and one by one adding more PC’s. We found that, initially, the classification performance increased, peaking at 95.0% with five PC’s for FFT and 92.5% with three PC’s for CQT and then gradually decreasing until at 100 PC’s the performance was little better than random selection. So overall, the variation produced slightly inferior results but with an increased level of complexity in the process. It is interesting to note that, with FFT data, there was useful information that enhanced classification in PC’s four and five, whereas

CHAPTER 4. CLASSIFICATION EXPERIMENTS 93

the classification with CQT declined when PC’s four and five were included. This suggests that, for CQT, there is less salient information contained in the higher numbered PC’s. This may be related to the logarithmic nature of the CQT frequency scale which has a clustering effect on the higher harmonics in the spectrum, meaning that the data can be effectively represented by fewer PC’s.

A second rotation of the axes was attempted using LDA after PCA - the data was repre- sented as three linear discriminants. For guitar tones, this resulted in a significant reduction in the classification rate compared to that obtained with PCA only. A possible explanation for this is related to the percussive/impulsive nature of guitar tones. As a consequence of this impulsive nature, the power and hence the position in frequency space for each window of data is constantly changing with time. This means that there is no cluster of points in frequency space and hence the mean spectrum will be a poor representation of a guitar timbre. Since the LDA works on the basis of choosing linear discriminant axes that give maximum separation between the interclass means, it is not surprising that it is less effective than the PCA as a means of classifying guitar tones.

Further variations were attempted to test the robustness of the classification process. In a natural situation it may not always be possible to compare complete and synchronised tones. When partial (but still synchronised) guitar tones were offered for classification, some interesting results were obtained. With a portion of the attack, we achieved a correct classification rate of 97.6%, an almost identical result to that with a whole tone, and with a large portion from the decay, we obtained an 87% correct classification rate. These results indicate that valuable information for classification is contained in both the attack and the decay but also, that the attack is a more efficient classifier than the decay. These results support the findings of Clark et al. (1963) who found that, for a full set of orchestra instruments, both the attack and steady-state were important aspects of timbre. These conclusions differ from much earlier findings of Helmholtz (1863) and his followers, who concluded that timbre was dependent on the steady state spectrum. A significant difference between our work and these earlier works on timbre is that their timbral studies depended on a human listening panel whereas our work depends on machine recognition. Using a human listening panel means the limitations or characteristics of the human listening process must be taken into consideration.

Interestingly, we found that good results could be obtained with only a small portion of a tone. Good results (around 80% correct) could be obtained from a single window taken

CHAPTER 4. CLASSIFICATION EXPERIMENTS 94

from the decay (approximate steady-state for guitar). Even better results (around 90% correct) were obtained from a single window from the attack. This indicates that guitar tones are highly reproducible and, given that tones can be synchronised, classification is possible with a small portion of a tone.

Due to the impulsive nature of guitar tones and the fast rate of change of the spectrum over time, it was expected that classification rates would be severely affected when off- set (unsynchronised) tones were compared. This prediction is supported by the plots of spectral centroid over time, shown earlier (figure 4.9), which illustrate the variable nature of the spectral centroid and consequently the spectral envelope. The graph suggests that the attack would be the portion of a tone that would be most sensitive to imperfect syn- chronisation. With an offset of just 5 data points or about 0.1 sec., the classification for a section of data including the attack, dropped to about 59% and for a section from the decay (approximately steady state), to about 74% - confirming our expectation. We found that further increase in the offset produced an even larger drop in the classification rate. The most significant factors that cause this sensitivity to offset are the variation over time in both the power envelope and the spectral centroid. These are problems that would have to be addressed in a real word recognition process. One possible solution could be to nor- malise the power in the extended decay section of a guitar tone. While this may produce a better classification result with incomplete tones, it is tampering with the timbre of the guitar tones in a major way - the power envelope is a significant attribute of the timbre of a guitar tone. Another possible solution to the problem is , given a pair of guitar tones for comparison, to match the pair of tones by essentially sliding one power envelope across the other until we have achieved the best match or, in other words, we have obtained a minimum value for the ‘distance’ measure.

Another important issue in comparing guitar tones is the importance of the tones being of the same pitch or fundamental frequency. When tones just one step different in pitch were compared, the classification rate dropped to about 56%. This suggested that the timbre of guitar tones is highly dependent on fundamental frequency. To further examine this question, the spectral centroid was plotted against frequency for several instruments. These graphs indicated that the average spectrum, and hence the timbre, is highly dependent on fundamental frequency and, in fact, the timbre for a guitar is a set of different but frequency related timbres (see figure 4.11). This finding suggests a possible area for improvement in other instrument recognition works (eg. Martin (1999), Kaminskyj (1999)) where pairs of

CHAPTER 4. CLASSIFICATION EXPERIMENTS 95

tones of similar but not equal fundamental frequency were compared.

In summary, we have shown that it is possible to obtain extremely high classification re- sults for a set of guitars in a controlled experiment but that the classification performance depends on a few factors which would be very difficult to control under natural circum- stances (eg. fundamental frequency and synchronisation of tones). We have also learnt that, for the guitar, there are important cues for classification in both the attack and the steady-state/decay and that the information in the steady-state/decay is more robust. The experiment has highlighted aspects of guitar timbre not previously taken into account in instrument classification. In particular, that the timbre of a particular tone varies signifi- cantly with time; that overall, guitar timbre is highly dependent on fundamental frequency; and that the timbre of a given guitar at a particular frequency is highly reproducible.

CHAPTER 4. CLASSIFICATION EXPERIMENTS 96