SEGONA PART ESTUDI EXPERIMENTAL
Taula 5.7. Puntuacions de partida dels grups experimentals.
5.3.2. Procediment de la intervenció.
Fisher information [67] is one central quantity in statistics, particularly in estima- tion theory [115]. Fisher information is sometimes referred as “information” in statistics [115]. Note that this should not be confused with mutual information defined above. Fisher information characterizes the amount of information that an observation (or measurement) m carries about an unknown parameter θ given the statistical relationship between m and θ. If both θ and m are scalar, Fisher
information can be defined as J(θ) = Z ∂lnp(m|θ) ∂θ 2 p(m|θ)dm. (2.2)
In this expression, p(m|θ) should be treated as the likelihood function on θ. lnp(m|θ) represents the log-likelihood function, which is often called “support curve”[58]. The slope of the support curve is the “score”, which represents how sensitively the likelihood function depends on the parameter θ. It is important to emphasize that while the likelihood function depends on a particular observation m, Fisher infor- mation does not. The reason is that the dependence on m is integrated out by definition. Thus, it is appropriate to interpret Fisher information as a measure of the expected sensitivity with respect to each value of the parameter θ, which is fully determined by the encoding model which specifies the relationship between the random variables m and θ. In some sense, Fisher information defines a metric in the space of θ.
As a remark, there is another way to define Fisher information
J(θ) =−
Z
∂2lnp(m|θ)
∂θ2 p(m|θ)dm. (2.3)
It is straightforward to check that these two definitions are equivalent. These defi- nitions only apply whenθ is a scalar. Ifθ is a vector, one can define a corresponding Fisher Information matrix [115, 1].
Fisher information has many interesting properties. For the purpose of this the- sis, I shall only introduce a few of them.
Cramer-Rao bound
Perhaps the most well-known result related to Fisher information is the Cramer- Rao bound [41, 139]. Cramer-Rao bound states that, under certain regularity con- ditions, Fisher information sets an lower bound on the variance of any unbiased
estimator ˆθ. Formally, it could be expressed as
V ar(ˆθ)≥ 1
J(θ). (2.4)
In general, Cramer-Rao bound could not be reached. It can only be tight for special kinds of statistical models. I shall come back to this point later in Chapter 5, where the conditions to make Cramer-Rao bound tight are discussed in some more details. Intuitively, Cramer-Rao bound means that the quality of encoding, quantified by the Fisher information, set a physical limit on how precise any unbiased estimator can be.
Forbiased estimator, the corresponding Cramer-Rao bound turns out to be
V ar(ˆθ)≥ [1 +b 0(θ)]2
J(θ) , (2.5)
(mean square error) of estimator ˆθ must satisfy
M SE(ˆθ)≥ [1 +b 0(θ)]2
J(θ) +b(θ)
2. (2.6)
It is useful to point out that the MSE of a biased estimator could be smaller than J(1θ) which defines a lower bound for any unbiased estimator. Although many might have the intuition that an unbiased estimator is advantageous compared to a biased estimator, this result suggest that, counter to that intuition, having a bias in the estimation could be actually desirable in certain situations.
Invariance of Fisher information
The square root of Fisher informationJ(θ) has a property of invariance, i.e.
p
J(θ)dθ =
q
J(˜θ)dθ,˜ (2.7)
where ˜θ is a re-parameterization of θ. As a corollary, the integral S =R pJ(θ)dθ is invariant with respect to any re-parameterization of θ. Under these notations, fJ(θ) =
√ J(θ)
S behaves like a probability density. fJ(θ) is known famously as Jeffreys
prior [97], which is a widely used non-informative prior in Bayesian statistics [101]. As a remark, the integral R J(θ)pdθ when taking p other than 1
2 is not invariant
Relationship to psychophysical and neural measurements
Fisher information has widely applications in many scientific fields. In the ex- treme, It has even been argued that Fisher information can provide a unification of many area of science[71]. In this thesis, I shall focus on its possible applications in terms of understanding the information processing in the brain. Let me start by noting that Fisher information has a nice relationship with respect to the most commonly taken psychophysical measurement, i.e. discrimination threshold. It has been well-established that [152, 151] Fisher information sets an lower bound on the discrimination threshold (.θ) in fine discrimination tasks:
d(θ)≥Cα
1
p J(θ),
whereCα is a constant determined by the specifics of the psychophysical procedure.
Fisher information can also be used to assess how much information a certain neuron (or neurons) carries about a particular stimulus dimension. Consider a Pois- son neuron with a smooth tuning curve f(θ). In this case, the Fisher information has a nice close-form expression
J(θ) = Tf(θ)
02
f(θ) ,
where T represents the length of the integration time. There are several basic insights from this expression. First, both the firing rate and the slope of the tuning
curve of a Poisson neuron are important in terms of the Fisher information the neuron’s response carries. Second, the neuron carries most Fisher information at the flank of its tuning curve rather than the peak. Third, the Fisher information scales linearly with the integration time and the gain of the neuron.
Fisher information and mutual information have intriguing relationships. On one hand, by definition, Mutual Information and Fisher information are quite different. Conceptually, Fisher information quantifies the local information, while mutual information is a global measure. On the other hand, these two measures are also closely related, as we will discuss in details in Chapter 5.
Chapter 3
Bayesian observer model
constrained by Efficient coding
explains “anti-Bayesian percept”
3.1
Introduction
Perception involves two important stages of processing: 1) the representation of incoming sensory information, and 2) the interpretation of that representation to form a percept. Two prominent hypotheses have separately guided our understand- ing of these two processing stages, but each has limitations when considered alone.
The Efficient Coding Hypothesis argues that neural resource limitations lead to effi-
statistics of the natural environment [4, 10]. This hypothesis can explain several key features of neural coding in early sensory areas (e.g.[134, 46, 120]), but it does not specify how these coding characteristics can give rise to important aspects of percep- tual behavior such as perceptual biases. In contrast, theBayesian Hypothesis posits that perception is an act of unconscious inference that interprets the noisy sensory representation in the context of prior knowledge about the world [90, 44, 103]. This hypothesis provides a normative explanation for many aspects of perceptual and sensorimotor behavior (e.g., [104, 169, 178, 98]), but it has been criticized for us- ing arbitrary model specifications in order to explain psychophysical data [99, 21]. Here we unify ideas of Efficient coding and Bayesian inference into a new model of perceptual behavior. Specifically, we propose an Bayesian observer model that is constrained by assuming an efficient representation of the sensory input.
Two key components define a Bayesian observer: the prior belief that reflects the observer’s expectation about how frequently a certain stimulus value occurs, and the likelihood function that captures the encoding accuracy in the sensory represen- tation of the observer. Previous studies have proposed independent constraints on either the prior belief based on natural (e.g., [175, 83]) or learned (e.g., [96, 104]) stimulus statistics, or the likelihood function based on natural stimulus uncertain- ties(e.g., [78, 29]) or neural physiological tuning characteristics (e.g., [169]), but not both. In contrast, our new model formulation jointly constrains both the prior belief and the likelihood function by assuming that the sensory representation as
well as the interpretation of the sensory evidence is optimized with regard to the stimulus statistics of its sensory environment. Thus, we can specify a Bayesian observer model for any stimulus variable with known natural statistics.
We validated our framework by formulating observer models for two perceptual variables for which the natural statistics are known, visual orientation and spatial frequency. The models make a number of distinct and rather surprising predic- tions; e.g., that percepts are frequently biased away from the peaks of the prior, a prediction that seems at odds with the standard Bayesian view. We demonstrate that the predictions are well matched by data from several studies reporting mea- sured biases in perceived visual orientation and spatial frequency under different levels and sources of uncertainty. That includes biases that are seemingly “anti- Bayesian” [23]. Our results demonstrate that by combining the ideas of Efficient coding and Bayesian decoding, we can formulate well constrained observer models that can account for perceptual behavior that has not been explained before. Some earlier version of this work has been previously presented [184].