ACTIVIDADES EN PROGRESIÓN POR FASES - Actividades en progresión para trabajar la orientación

5. DISEÑO

5.2 ACTIVIDADES EN PROGRESIÓN POR FASES

1. Solving speech recognition problems requires adequate transformation techniques to transform the raw speech signal into a set of features. This transformation reduces the data and makes the recognition easier through dealing with smaller amounts of data. In addition to the Fourier transform, the following

transformation can be used:

• Cepstral coefficients: The cepstrum is defined as an aggregated coefficient calculated over

logarithm transformation of filtered signals. Some computer speech recognition systems use mel-scaled cepstrum coefficients (MSCC) (Davis and Mermestein 1980). Cepstral parameters are often preferred for speech recognition applications in noisy environments because they have been derived from high-

resolution spectral estimators (Picone 1993). One segment of a wave speech signal is represented by a vector of n MSCC.

• Linear prediction: The linear prediction model of speech states that each speech sample can be predicted from the sum of weighted past samples of speech. The values of the coefficients are calculated by minimizing the error between the actual speech and the predicted speech.

Figure 1.39 shows the formulas that can be used to achieve the above transformations.

Part B: Practical Tasks

2. Give a definition of heuristics for problem-solving. What is "goodness" of a heuristic? What does it mean for a heuristic to be "informed?" Give examples of ill- and well-informed heuristics.

3. What is the difference between past historical data and heuristic rules? What is the difference between a rule and a formula? Give three examples of each for a different generic problem and a different

application domain area.

4. Give a general form of a heuristic rule and explain it. Give general forms of heuristic rules for four generic problems and explain each of them.

(1) Discrete Fourier Transform (DFT):

where S(k) is the kth frequency component, and N, is the size of the analysis frame.

(2) Cepstrum transformation:

where c(n) is the cepstrum coefficient and N, is the number of samples in a frame.

(3) Mel-scaled cepstrum coefficients (cm(n)):

where Nfis the number of mel-scaled filters, Xk,k=1,2,... Nf,represents the log-energy output of the kth filter and M is the

number of cepstrum coefficients. Cepstral parameters are often preferred for speech recognition applications in noisy environments as they have been derived from high resolution spectral estimators (Picone, 1993).

(4) Linear prediction:

where s(n) is the current speech sample, N_LPis the number of linear predictor coefficients, a_LP(i) are the weights of the past speech samples, also known as predictor coefficients, and e(n) is the error of the model. The values of the coefficients are calculated by minimising the error between the actual speech and the predicted speech.

Figure 1.39

Different transformations applicable to speech signals.

5. Give one more example of a specific problem for every generic problem type.

6. Suggest a minimum set of features which can be used to distinguish the two handwritten digits 3 and 5. 7. What is the meaning of data rate reduction in speech recognition systems? Give an example.

8. Why are speech signal transformations needed? 9. What are the difficulties in building ASR systems?

10. Is the stock market predictable according to figure 1.23? Explain your arguments. 11. What reasons should one use when choosing the feature space for prediction?

12. Give another example of a specific problem of prediction. Explain all the general issues given in section 1.6.1 for this particular problem.

13. Imagine that two more fuzzy values are defined for the angle and the angular velocity in the Inverted Pendulum control example, which are named positive large and negative large. Add some more fuzzy rules to the set given in figure 1.33 to describe the reaction of the control system if these values happen on the input.

14. Imagine a problem called the Ball and Beam Problem. The beam is made to rotate in a vertical plane around the center of rotation. The ball is free to roll along the beam. The task is to articulate an initial set of fuzzy rules for keeping the ball in a balanced position by applying a force to the beam, if the object is represented by four state variables—the distance between the center of the ball and the center of rotation; the change in the distance; the angle between the beam and the horizontal axis; and the change in the angle.

15. An example of an optimization problem is the Resource Scheduling Problem. A project consists of a set of activities on a time scale. Every activity has been assigned five parameters: the earliest possible starting day, the earliest possible completion day, the latest possible starting day, the latest possible completion day, and the number of workers involved. The problem is to find the most "leveled" (even) distribution of number of workers until completion of the whole project. Give heuristic rules for solving the problem after introducing reasonable restrictions.

16. The Resource Assignment Problem consists of assigning n workers to n jobs in the best (most profitable) way, when given the profit cij of assigning every worker i to every job j. Give heuristic rules

for solving that problem. How can you evaluate the ''goodness" of these heuristics?

17. What characteristic of neural networks makes them suitable for solving the specific problems given in this chapter?

18. Explain the difference between the different pathways in figure 1.37.

19. Looking at the data set of water flow into a sewage plant graphed in appendix C and in figure 7.1, try to elaborate rules for predicting the flow depending on the time (the hour) of day.

Part C: A Sample Project on Data Analysis Topic: Speech Data Analysis

TASKS (see Appendixes G and J)

1. Speech data collection: Record three times each the digit words from 0 to 9 spoken by yourself. Save the recorded raw speech files on a disk. Explain in a few sentences what "sampling frequency" is and how it should be chosen for particular recordings. Report the values for the following parameters for one

recorded digit from each of the groups: {0, 1, 2, 3}; {4, 5, 6}; {7, 8, 9}: a. Recording time.

b. Sampling frequency. c. Number of samples.

d. Size of the raw data (in kilobytes).

Explain the relationship between the recording time, sampling frequency, number of samples, and size of the raw signal.

2. Speech data display: Explain in a paragraph the principles of at least three ways of displaying speech data, for example, waveform, spectrum, frequency display. Display the spectra of the digits chosen for analysis.

3. Speech data grouping—phoneme analysis: Define by observation the boundary between the following phonemes in the pronounced words: /z/and /e/ in zero; /t/ and /u/ in two; /f/ and /o/ in four; /f/ and /ai/ in

five; /s/ and /e/ in seven; /ei/ and /t/ in eight; /n/ and /ai/ in nine. Separate the areas of the different phonemes. Explain briefly some general differences

between the spectra of the fricatives you have in your examples (/z/,/f/, /s/,/Θ/) and the vowels

(/e/,/u/,/o/,/ai/, etc.). Such differences can be, for example, amplitude in the time domain, energy in the frequency domain, etc.

4. Variations of speech

a. Compare the spectra of one digit in its three pronunciations. Explain the difference.

b. Compare the spectra of different appearances of a phoneme in different words, for example, /f/ appears in "five" and "four." Explain the difference (the so-called coarticulation effect).

5. Speech data transformations

a. Explain why speech data transformation would be needed.

b. Explain the rationale of the Fourier and the mel-scale transformations.

c. Give an example of two small consecutive segments of a spoken digit where the waveforms look the same (or very similar), but the spectra are very different. Explain why this is happening.

d. Select a vowel segment from the spectrum of speech data and plot the "frequency vs. energy" for this segment. Find and report the frequency with the highest energy.

In document Actividades en progresión para trabajar la orientación (página 31-39)