• No se han encontrado resultados

Las decoraciones coloniales tempranas El Coricancha y la Iglesia, símbolos de centralidad espiritual.

This section addresses the problem of obtaining distributions of perturbation levels. Levels will be drawn from these distributions when perturbing training utterances to create a multi-style training corpus. Section 3.4.2.1 describes the approach used for finding a distribution of these levels for a single perturbation type to model a set of utterances taken from a target domain. Section 3.4.2.2 describes how this approach can be extended to identifying distributions of perturbation levels for multiple perturbation types.

3.4.2.1 Empirical distributions for a single perturbation type

The procedure for estimating a distribution, pt(), over perturbation levels, At, for

a single perturbation type, t, is summarised by algorithm 1. The goal is for this distribution to assign weight to a given perturbation level based on the frequency with which data perturbed with that level is found to most closely match based on the distance measure to a set of utterances selected from the target domain. Given an uncorrupted training set, Xtr, and N sets of utterances, X1ta, . . . , XNta, sampled from the target domain, the procedure in algorithm 1 determines a distribution of perturbation levels, pt(), that best matches all N data sets from the target domain.

Then, the multi-style training set, XM T R, can be generated by perturbing utterances

with levels sampled from At: {αt1, . . . , αtM} according to perturbation distribution,

ˆ pt.

Algorithm 1 Perturbation distribution estimation procedure Given: Training data Xtr, data sets Xta

1 , . . . , XNta sampled from target domain,

and perturbation levels At: {αt1, . . . , αtM} for perturbation type t

Initialize Counts: ft(α) ← 0 ∀α ∈ At

for All Xta

i ∈ {X1ta, . . . , XNta} do

Compute target posteriors and stats (Fig 3.5): CT i

end for

for All α ∈ At do

Perturb training utterances: Xtr(α) = F

t(Xtr, α)

Compute training posteriors and stats (Eq. 3.13): CαP for All Xta

i ∈ {X1ta, . . . , XNta} do

Compute similarity measure (Eq. 3.14): Φ(CP

α, CiT)

Perturb. level (Eq. 3.15): ˆαi = arg minαΦ(CαP, CiT)

ft( ˆαi) = ft( ˆαi) + 1 end for end for for α ∈ At do ˆ pt(α) = ft(α)/N end for

Estimation of ˆpt can be described as follows. First, as illustrated in Figure 3.5,

DNN posteriors are derived from the data sets, Xta

i , and sequence statistics, CiT,

are estimated from the posteriors. Second, Xtr is perturbed with each α ∈ A

t to

produce M perturbed versions of the training set, X(α)tr, ∀α ∈ A

t . The notation

Ft(Xtr, α) in algorithm 1 signifies the process of perturbing the training data set

with a perturbation type t. Third, an optimum ˆαt

i is identified for each data set

sampled from the target domain. This corresponds to the perturbation level that, when applied to the training data, best matches the ith sample of utterances from the target data set according to the distance measure defined in equation 3.15. The frequency count, ft( ˆαti), associated with ˆαti is incremented, and the perturbation

distribution is obtained from the normalised counts, ˆpt(α) = ft( ˆαt)/N .

Having estimated the perturbation distribution from multiple subsets of the tar- get domain, this distribution is then used for perturbing the training utterances to create a final multi-style training set which has the minimum mismatch according to the defined measure to the target test set. For each training utterance, a perturba- tion level is randomly selected from the set At according to distribution ˆpt. Section

3.4.3 describes how this multi-style set is used to train a DNN acoustic model and is then evaluated on utterances sampled from the same target domain.

3.4.2.2 Extension to multiple perturbation types

The procedures outlined in sections 3.4.1.1 and 3.4.2.1 address the problem of iden- tifying a distribution of perturbation levels associated with a single perturbation type. The more general case would be to estimate a multi-variate distribution of perturbation levels across a set of P perturbation types. It is possible to combine the perturbation levels from all perturbation types and estimate a single multi-variate distribution. However, in these experiments, multiple univariate perturbation dis- tributions are estimated, one for each perturbation type.

A sequential procedure is used for estimating distributions of perturbation levels for multiple perturbation types. The general outline of this procedure is summarised in figure 3.7. The process begins with sets of perturbation levels for P perturbation types, A1, A2, . . . , AP. At each step of the process, an optimum level ˆαt is selected

using the procedure described in Section 3.4.1.1. Then, this ˆαtis used to perturb the

training utterances for all succeeding steps of the process when selecting perturbation levels for other perturbation types. For example, if perturbation set A1 corresponds

to the set of possible noise levels and set A2 corresponds to room configurations,

the first step of the sequential process would be to estimate the optimum noise level ˆα1. Then the training utterances would be corrupted using this noise level

Figure 3.7: Sequential estimation of perturbation levels for multiple perturbation types

This process is repeated until perturbation levels for all P perturbation types have been identified. It was observed that finding the extrinsic variabilities first and then finding the intrinsic variabilities yield better performance and in the following experiments the same order was followed.