• No se han encontrado resultados

4 Servidor personalizado de rutas

4.1. Estimador de Rutas

4.1.4. Filtros de modificación

A.1 The relation of variability σ

z2

to chi-Square testing

The variance of the z-scores is proportional to the Chi-Square statistic as σz2 = 1/N ·

P

i((si − ni)2/ni) = 1/N · χ2. Therefore by comparing the value of N · σ2z to a Chi-Square distribution a p-value can be computed. For Poisson the number of degrees of freedom is equal to the number of estimated spike count expectations minus the number of parameters of the firing rate map estimation minus one. The reduction by one is due to the constraint on the overall firing rate. For example, a z-score variance higher than 1.15 indicates statistically significant deviation from Poisson in the case of a 20 minute recording (divided into 5s time windows, gives 300 categories) and a significance level α = 5%. This is because the 5 percent quantile of a chi-square distribution with 300-3 degrees of freedom is at about 1.15 · 300. Dependencies between the expectation values in each time bin further decrease the number of degrees of freedom. In that case the percentile is reduced as df = n− nP − nD − 1, where n denotes the number of possible outcomes, nP is the number of parameters to be estimated and nD is the number of dependencies between the parameters. In other word, given any firing rate model leading to the expected counts ni

in time windows i with a known number of degrees of freedom, we can define a significance threshold on σz2 given a spiking distribution like Poisson or some other count distribution.

A.2 The upper bound on variability and its relation to the zero inflated Poisson parameter

The minimal MSE estimator as well as the MLE estimator of the expectation value of the underlying yet unknown distribution of data s is given by its mean. The average squared residual to the mean is minimal and is equal to the variance of s. Therefore the most naive

var(z) = var( q

hsi ) =

hsi =

s (A.1)

which is equal to the fano factor if s are measured across trials.

Given a second series ob observables {φ} (i.e. the signal/stimulus like positions, speed etc.) from a unique and finite set Φ of all possible stimulus combinations we can compute the conditioned expectation to one specific stimulus combination Φi which is given by the mean of all data st where φt= Φi:

For each of these stimulus combinations the minimal MSE estimator again is the mean but this time only of the respective subset of the data. The variance var(s) of s can be divided into a sum of sums over members of the subsets:

var(s) = 1 Note that the right part is not a sum over conditional variances but conditional mean squared errors. Knowing M SE(x|y) ≥ var(x) we can write:

1 Now we assume s being Gamma distributed within a subset (meaning variance = a·mean) and replace the MSE (the inner sum on the left, for which we know that it is larger than the variance of the subset) by a linear function of the subsets mean:

M SEi := (ai+ ǫi)hsi ≥ aihs|Φii (A.6) Then for the variance-to-mean ratio we get:

var(z) = var(s)

var(z|φ) = N i Nφti

hs|Φi

hs|Φii = hai (A.10)

Therefore

var(z) ≥ var(z|φ) (A.11)

holds for data where a zero mean implies a zero variance like inh. Poisson count data or more generally Gamma distributed data. In other words for any data that can be grouped such that within the groups the variances are smaller than the mean squared errors to the overall mean a model can be found for the expectation values such that the variance of z-scores with respect to that model drops (i.e. variability is explained).

The set of all hs|Φii∀i ∈ {1, ..., |Φ|} estimated via MLE or MMSE may be denoted as informative model. Such an informative model allows us to reduce the variance of the z-scores. When assuming there exists an informative model than the inequality (A.11) allows to specify an upper bound on the zero-inflation parameter α without knowing the informative model or the respective mappings φt→ Φi or the stimulus at all.

The left side of eqn. (A.11) describes the variability of data under homogeneity assump-tion whereas the right side refers to situaassump-tions in which a model about the inhomogeneity is available. For example, the measured variability in spike counts of a grid cells will always be lowered when not being ignorant to the presence of spatial modulations of the firing.

This is not surprising but comes with a useful implication for estimating the parameter α of a zero inflated model as described in the following paragraph.

Given a informative sequence of expectation values {n} ∈ RN+, which we will refer to as informative model, and a series of measured counts {s} we can now ask what is the value of α. By informative model we are referring to models where the variability is smaller or equal than the variance-to-mean ratio:

σ2z ≤ var(s)/s (A.12)

and thereby all models that hold eqn. (A.11). To predict values of σz2({n}, α) we first derive an analytical expression for the variance σz,i2 (ni, α) of z-scores in one bin i (in which we can assume the process to be homogeneous) and demonstrate that the variance σ2z taking all N bins i into account satisfies hσ2zi(ni, α)ii∈{0,...,N } = σ2z(n, α). This allows us to define an upper bound for α.

The theoretical variance of the z-score distribution for a ZIP process is σz,i2 (ni, α) =X

σz2 = 1 N

N

X

i

σz,i2 (ni, α) = hσ2zi(ni, α)ii∈{0,...,N }= 1 + n · α

1 − α = σ2z(n, α) (A.14) which is equal for the homogeneous case (ni = n) and the inhomogeneous case (where ni can vary) if the underlying process is a ZIP process. Here we used that the average of z-transformed variables is zero by definition. Note that this is not contradicting our definition of informative models as here we computed the mean of the bin-wise variances and did not compute the variance of observed z-scores across the bins where the bin-wise variance is unknown.

We can thus estimate α by solving

ˆ

α = n

σ2z − 1+ 1

!−1

(A.15) with σ2z = σz,empirical2 . Without any knowledge about ni eqn. (A.15) provides an upper limit on α({n}) via n = s for an informative model {n}. This is true because for the empirical variabilities it holds σ2z,emp.(S, {n}) < σz,emp.2 (S, s) = σs2/s as we postulated above.

With σ2z = σz2(S, s) eqn. (A.15) is equal to

ˆ

α(S) = 1

CV2− 1/s + 1

!−1

(A.16) with CV = σS/¯s = σS/n= σz2(S, s)/σS.

It is worth to mention that the requirement for informative models can be met even for models with high likelihood that are not strictly informative via restricting the analysis to regions where the ratio between s and n is not too large (usually when n ≥ 1). In the perspective of studying zero inflation this is natural to do because low expectations are superimposing excess zeros. The inflation probability α could be arbitrarily overestimated when taking data into account where there are no or only rarely spikes expected. In that scenario there is no chance to find out whether zero counts are thrown due to low expectation values and high or low zero inflation.

We conclude that by measuring the variance and the mean of spike counts across time windows or trials an upper limit for α can be estimated without model fitting.