Objeción de conciencia a las instrucciones previas

If the analysed signal consists of a superposition of features at arbitrary locations, then the model used to learn these features has to have enough free parameters to represent these features. In general this means that at least one feature has to be learned for each feature present. However, in the standard sparse coding model, features have to be learned at all possible shifts, so that the number of features to be learned is much larger than the number of features in the signal. If the standard sparse coding model does not have enough free parameters to represent the features in the signal, not all features are learned. Instead, some features have to be used to model more than one feature in the observation.

In this section we study the influence of the number of features used in the traditional sparse coding model, when this number is smaller than the number of features in the signal. We assume here that the observed signal follows the model x=P

aksk+ǫ. ˆaˆ_k and ˆsˆ_k are used to denote the

to denote the features and the associated shifts of the underlying process, while ˆk indexes the learned features.

The expected ML estimate of a feature ˆaˇ_k w.r.t. the distribution of

the data, i.e. w.r.t. the distribution of ǫ and s, is the value for which the expected gradient is zero. We can write this expected gradient as:

h∆ˆa_kˇi_p₍_ǫ,s₎= * µσ_ǫ−2 Z   X k aksk− X ˆ k ˆ aˆ_kˆsˆ_k+ǫ  sˆ_kˇ p(ˆs|x,A)ˆ dˆs + p(ǫ,s₎ .

Note the use of ˇk to index the particular feature for which we evaluate the gradient and the corresponding coefficient sˇ_k, whilek indexes the true

features in the generative model. ˆk indexes all of the estimated features and coefficients. Using the abbreviation

T =µσ−_ǫ2   X k aksk− X ˆ k ˆ aˆ_kˆsˆ_k+ǫ  sˆ_kˇ

we can write this as:

Z Z Z

Tp(ˆs|x,A)ˆ p(ǫ,s) dˆs ds dǫ

Z Z Z

Tp(ˆs_|ǫ,Aˆ,A,s)p(ǫ)p(s) dˆs ds dǫ ,

where the last step is possible as s,A and ǫ definex and as ǫ is assumed to be independent ofs. Setting the gradient to zero and rearranging gives:

where we have introduced the index k to label the true feature and coefficient associated with the feature and coefficient to be learned, i.e. we assume that feature ˆaˇ_k converges to feature a_k. If we assume that ˆsˇ_k is

CHAPTER 3. SHIFT-INVARIANT SPARSE CODING ₅₉

to the assumed independence of the individual ˆsˆ_k. So we are left with:

a_kˇhsˆˇ_ksˆ_kˇi_p_(ˆs_|AAˆ ₎ = a_khs_ksˆ_kˇi_p_(ˆs_,s_|AAˆ ₎

+ X

k6=k

akhsksˆ_kˇi_p_(ˆs_,s_|AAˆ ₎

In order for a feature ˆaˇ_k to converge to a feature a_k we require the corre-

lation between ˆsˇ_k and sk to be zero for allk 6=k.

If the number of features used to model a signal is less than the number of features in the signal at all locations, then dependencies between ˆsˇ_k and

several sk have to occur. Dependencies can also occur as a result of the

inference process or the approximations to the learning rule used.

To analyse the possible dependencies which can occur due to the in- correct model size, we assume that all learned features have converged to some of the true features. The dependency between ˆs_kˆ and sk (and

therefore the exact form of the averaging process described above) then depends on which of the featuresak are modelled by each feature ˆa_kˆ. The

feature chosen to model a feature which has not been learned, depends on the decrease in reconstruction error when using this feature. The highest decrease in this error is achieved by modelling a feature in the signal with the same feature at the exact location. If this feature is not available at this location, a feature at a different location or a different feature has to be used.

In the following list three forms of dependencies which can occur are given together with the influence they have on the learned features:

• A feature can be modelled with a slightly shifted version of itself. If several slightly shifted features are modelled by a single feature, then the average update of this feature is a low-pass filtered version of the true feature.

• A windowed periodic feature can be modelled with a version of itself which is shifted by multiples of the period. A weighted averaging over such feature shifts leads to a windowing of the learned feature.

• A missing feature can also be modelled with a different feature. The chosen feature is likely to share a strong frequency component and

is at a location at which both features have the same phase for this component. Averaging then increases this frequency component but might decrease other frequency components, as the phase for those other components might not match.

This seems to suggest that if the number of features to be learned is less than the number of features in the signal, windowed and filtered features emerge. However, the above derivation uses the traditional sparse coding formulation. If shift-invariance is explicitly enforced and if the inference process is working correctly (i.e. the sk are uncorrelated to ˆs_kˆ for all but

one pair of coefficients) then the first two effects (i.e. the filtering and the windowing) cannot occur.

In document Conocimientos y actitudes de usuarios, médicos y enfermeras sobre las instrucciones previas en el Área Asistencial Este de la Comunidad de Madrid (página 108-111)