As introduced above, a speech quality estimator that does not require a proper reference signal is attractive. Figure 3-2 shows the block diagram of one such estimator developed for DHA applications. The SRMR-HA is a modified and extended version of the Speech to Reverberation Modulation energy Ratio (SRMR) [70], which was originally developed for assessing the performance of dereverberation algorithms and validated with subjective data collected from NH listeners.
Signal Under Test Subject Hearing Loss Gammatone Filterbank & Envelope Computation Modulation Filterbank Noise Energy Computation Speech Energy Computation
÷
SRMR-HA To modulation filterbank To modulation filterbankFrom modulation filterbank
From modulation filterbank
Figure 3-2: A reference free speech quality estimator for hearing aid applications.
Being a reference-free technique, the SRMR-HA method does not require any prior temporal alignment. Similar to the HASQI computational procedure, the processed signal is first passed through a gammatone filterbank which is implemented based on the work of Cooke described in [61]. The gammatone function is derived based on
experimental studies of frequency selectivity in the human auditory system and is given by:
(3.1)
where is the gammatone impulse response, is the filter order, is related to the filter bandwidth, is the radian frequency and is the unit-step function. For analysis and evaluation purposes, it was necessary to develop a digital domain filter approximation that fits this model as closely as possible. Cooke investigated various methods to achieve this and found that the application of an Impulse Invariant Transform (IIT) yielded the most accurate results. The impulse invariant transform approximates a continuous time filter by finding a digital domain transfer function that results from a sampled version of the continuous time impulse response. This can be expressed as follows:
(3.2)
where is the continuous time impulse response of the filter to be approximated, is the transfer function of the discreet-time filter and is the sampling period. The gammatone filter of order can then be defined as follows:
(3.3)
Based on the well-known properties of the Z transform, transfer function representations of the digital approximation to the gammatone filter for orders 1 through 4 were found to be:
(3.5) (3.6) (3.7)
where For this study, was used to implement the gammatone filter bank. In order to account for the effects of SNHL, the Q factor of each filter is adjusted based on the OHCLoss parameter derived from the HL data in line with the description
provided in section 2.2.2 and equation (2.2).
After the gammatone filterbank portion of the model is complete, the next step is to apply the extracted envelope in each channel to an 8-channel modulation filterbank, which has centre frequencies of 4.00 Hz, 6.60 Hz, 10.8 Hz, 17.7 Hz, 29.0 Hz, 47.6 Hz, 78.0 Hz and 128 Hz. Each filter within the filterbank was implemented as a second order bandpass filter with a Q value of 2. The lower four channels of the modulation filterbank are assumed to contain mostly speech-related components, while the upper four channels are occupied by predominantly noise- or distortion-related components [70], [74]. As such, the SRMR-HA is calculated as the ratio of modulation energies in the lower and upper four channels. The rationale for quantifying the modulation energies in the above- described fashion can be explained from the modulation-domain spectrograms.
Figure 3-3 displays modulation spectrograms computed from a set of speech stimuli from the bilateral DHA database described in Chapter 2. In these plots, the abscissa represents the centre frequency of the modulation filterbank, the ordinate represents the centre frequency of the gammatone filterbank, and the colors represent the relative modulation energy. The top-left panel displays the modulation spectrogram of a clean speech sample. It is important to point out that much of the modulation energy in this figure occupies the 4 Hz – 10.8 Hz range.
Figure 3-3: Modulation spectrograms derived from a set of speech stimuli from the bilateral DHA database created in Chapter 2.
The top-right panel shows the modulation spectrogram of the HA2 output, when it is programmed to be in omnidirectional mode and when the clean input speech sample was played back along with speech-shaped noise at 0 dB SNR in the low reverberant
environment (the asymmetric noise condition described in Chapter 2). Two phenomena can be noticed in this plot: (a) there is a shift in modulation energy towards high
frequencies along the y-axis. This is due to the high frequency gain imparted by the DHA to compensate for the high frequency hearing loss; and (b) the modulation energy is no longer concentrated in the lower frequencies, as presence of background noise led to the spread of modulation energy across the 4 – 128 Hz region. Activation of adaptive directionality counteracts against this, by reducing the background noise. The two modulation spectrograms in the bottom row of Figure 3-3 attest to this fact, where the spread of energy towards higher modulation frequencies is mitigated. It is also useful to
highlight the differences between “HA2 adaptive” and “HA1 adaptive”. A greater proportion of the lower frequency modulation energy is preserved by HA2 adaptive. As such, it will have a greater SRMR-HA value. This relates to the subjective data, as results from the previous chapter showed that HI listeners preferred the quality of HA2 in directional mode and in the presence of background noise.