estudio de las prestaciones de 12 especies para cubiertas
2.2 Materiales y Métodos
2.3.5 transpiración cuticular y peso específico foliar
During preparation of the test samples and pilot testing, it was noted that although absolute differences could be heard, these were considered small in terms of perceived quality. One explanation for this is the constant decay time, which, as will be discussed further in Chapter 6 is an important perceptual factor. Furthermore, particularly when considering the cut-off frequency of 100Hz, differences are harder to perceive due to our reduced perception at the lowest frequencies.
A task requiring subjects to rate all samples using a direct scaling method is difficult under these circumstances. Such challenges are noted by Bech and Zacharov (2006). For these reasons, a pair-wise comparison method was chosen, with subjects asked whether the quality of one sample was worse/same/better than another. The ‘same’ option was included due to having a small sample set of five samples. In subsequent testing of perceived quality, another method of paired comparison has been used, simply asking the subject to choose between two samples (see Chapters 8 and 9). Here however, the method allowing the three options was chosen. This has been used successfully by Huang et al. (2008). The method also allows for a comparison of identical samples which can help determine subject accuracy. Each sample was rated against each other including reversals. For example Sample A =
50m3, Sample B =100m3 and also Sample A =100m3, Sample B =50m3. Listeners
were instructed to audition Samples A and B, and then make a decision based upon the ‘overall quality of low frequency reproduction’. They were encouraged not to spend a great deal of time on each comparison - if they could not detect a noticeable difference, it was to be assumed that the quality was the same.
The interface was created in the MATLAB environment as shown in Figure 5.10. Before undertaking the test, subjects were given a short training phase where they were played the music sample that would be used in the test, with a variety of differing modal artefacts, including some which were exaggerated beyond the le- vels present in the test samples. This stage was designed in order for the subjects to become accustomed with the sample and the likely degradation effects. Further- more, many of the subjects had taken previous tests and were becoming increasingly familiar with the music sample used.
Seven subjects were tested, with all but one having had prior listening test ex- perience. All had experience mixing music in a number of listening environments.
Figure 5.10: Graphical user interface for testing the perceived quality of modal density
Furthermore, five subjects had been through a listening panel screening test inclu- ding audiometry and an introduction to critical listening comparisons. One subject reported tiredness before taking the test, although analysis shows their results to be similar to the others and so are included herein.
5.6.3 Results
The results for each subject’s 25 comparisons were placed in a matrix, using -1, 0 and 1 for the ratings ‘worse’, ‘same’ and ‘better’ respectively. The resulting matrix was analysed not only to determine the quality rating of each sample, but also for ‘judgement errors’ which allow the validity of the scaling to be considered. As mentioned, this technique has been used successfully in the work of Huang et al. (2008) in rating the annoyance of noise samples and the analysis here follows a similar process.
An example matrix for Subject 1 is shown in Table 5.5. As can be seen from the table, the subject correctly identified each identical pair as the same (deep shaded cells scored 0). The scores in each row relate to the perceived quality of B against the column A. For example, the scores show that Sample 4 was rated as better than Sample 2 (score of 1). When this pair were rated the other way around, we can see that Sample 2 was rated worse than Sample 4. If this had not been the case, there would be an inconsistency in judgement.
In order to analyse the results, the first step is to determine the level of judgement errors for each subject. Two types of error are evaluated here -self comparison errors
(sc) and difference comparisons (comp). Self comparisons refer to the case where Sample A and B are identical, while the difference comparisons are those where the reversal of the rooms for samples A and B are considered (see Table 5.6).
A 1 2 3 4 5 Rrow B 1 0 0 0 -1 -1 -2 2 -1 0 0 -1 0 -2 3 0 1 0 -1 -1 -2 4 1 1 1 0 0 3 5 0 1 1 -1 0 -1 Rcol 0 3 2 -4 -2
Table 5.5: Example matrix of a subject’s responses to the five room volumes Error Type Error Recorded When:
sc Rij ”= 0 when i=j
comp Rij ”=≠Rji
Table 5.6: The calculation of judgement errors
For each matrix there are a total of five possible self comparisons and ten possible sources of error between comparisons of two samples. A total of 15 errors were therefore possible. The error rate was calculated for each subject as a percentage of this total (Table 5.7).
It is clear that the level of judgement errors was very high. The average error across subjects and the two test crossover frequencies was 49.9%. Therefore the quality ratings revealed in the following section are considered indicative rather than conclusive.
A number of comments from listeners suggested that the definition of quality was confounded by the differing effects of the bass guitar and kick drum parts.