• No se han encontrado resultados

CAPÍTULO 3. MATERIAL Y MÉTODOS

3.2 PRESENTACIÓN DE CASOS / MUESTRA

3.8

Asymptotic Behaviour

The evidence and predictive distributions are the goal in our data analysis. Assuming that we have accumulated lots of logically independent data and that our models are well-conditioned so that the maximum likelihood approximation is valid, our likelihood function tends asymptotically towards a product of delta functions:

lim J →∞p(D|θ, K) LI =Y k δ(θk− θ∗k), (3.8.1)

where θ is a set of parameters and θ∗ is the set of ‘true’ parameters that best describes the data set that we have observed if our model is appropriate. Under this large data limit the evidence is then just an evaluation of the prior for a specific value of θ,

lim

J →∞p(D|K)

LI

= p(θ∗|K). (3.8.2)

The posterior also tends towards a product of normalised delta functions and the predictive distribution becoming equal to the likelihood function with these “best” parameters,

lim

J →∞p(R|D, K)

LI

= p(R|θ∗, K). (3.8.3)

The point of this discussion is that the prior and the likelihood functions are asymptotic forms of the evidence and predictive distributions respectively, which is why we place our focus on the finite forms instead of the asymptotic forms. Next we discuss how we assign probabilities in the first place. As we shall see, our guiding principal is to reason consistently.

Chapter 4

Assigning probabilities using

logical independence

The Probability for an event is the ratio of the number of cases favourable to it, to the number of all cases possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible.

Pierre-Simon, marquis de Laplace How should a logical proposition be translated into a prediction for future data? In other words, how do we assign probabilities to events? We will take the classical view that the “randomness” in a physical system is caused by our lack of knowledge or control of the system. Thus following Jaynes (2003) and d’Agostini (2003) we prefer to replace “random” with “uncertain” or “uncontrolled”. Consequently we will exclude quantum mechanics from the discussion in this dissertation as it might require a physical “randomness”. In this picture, classical physics is composed of systematic or controllable effects which are to be modelled, and uncontrollable effects that are not reproducible by the experimental technique and apparatus in use. Often the controllable effects are macroscopic while the uncontrollable ones are microscopic. What is uncontrolled depends on our current scientific expertise and the quality of the apparatus. If one or both improve, previously uncontrollable variables may become controllable and therefore “nonrandom”.

In a slightly more mathematical formulation, we may say that the systematic effects may be adequately represented by a small number of macroscopic variables while the rest of the variables, usually the large majority, do not matter individually but only through their collective contribution to these few macroscopic variables. The classic example for this situation is, of course, that of statistical physics, where the 6N variables of N particles are relevant only insofar as they determine or modify the macroscopic variables of average energy (temperature), heat capacity, susceptibilities etc.

In this chapter, we shall construct a mathematical framework which strongly resembles that of traditional statistical mechanics and indeed encompasses it. Classical statistical

35

mechanics is the easiest example where all the randomness in a system is caused by our ignorance of the initial conditions of the system. We shall be more careful than most books in doing so, because the building blocks and results that appear in this chapter will form the point of departure or baseline from which later developments in this dissertation will be developed.

There is essentially only one problem in statistical thermodynamics: the distribution of energy E over N particles. In this chapter the number of particles N will be replaced by the number of trials R, the energy of a system E will be replaced by a generalised constraint G, the volume of the system V will play no role in this chapter and the thermodynamic limit (N → ∞, while E/N is kept constant) will be called the large data limit or large prediction limit as in Section (3.8) depending on the context. The remarkable thing about this translation is that it changes nothing of the formalism of classical statistical mechanics, which we can then view as a procedure of constructing hypotheses or assigning probabilities.

There is one exception where we encounter conceptual difficulties and that is the idea of distinguishable and indistinguishable particles. We will derive Maxwell-Boltzmann statistics by using Logical Independence, which is usually associated with classical particles that have definite trajectories and are thus called distinguishable particles. Interestingly, Maxwell-Boltzmann and Bose-Einstein statistics can be derived for both distinguishable and indistinguishable particles, see Constantini (1987). Thus instead of using the physics definition we will define indistinguishable as exchangeable, which implies that there is no ordering of the trials and thus no trajectories. Obviously as we are discussing logical propositions and not particles it would be difficult to introduce trajectories in the first place. We hence assume our system is in “equilibrium” and that we are dealing with indistinguishable classical particles. We will use Chapter 9 of Jaynes (2003) as a basis for this chapter, but we hope to improve on that discussion. The real purpose of this chapter, however, is to show that all of the formalism flows from the logical independence assumption, which is philosophically problematic and which we wish to replace with a more solid foundation.

An important point that we will repeat many times: the framework below is contin- uous refinement of assigned probabilities; nowhere does data play a role here. All the assignments are based on logical propositions. The remainder of this chapter is structured as follows:

• In Section 4.1, the very high-dimensional “primordial” outcome space SR is intro- duced, where each vector x ∈ SR represents R distinguishable outcomes and the probability of any x is the same due to the Principle of Indifference.

• Section 4.2 explains the Principle of Indifference as a group invariance argument. We also point out that the concept of repetition plays no role when we apply the principle directly to the sample space.

• Section 4.3 traces a first projection from the R-dimensional SR hypercube space to a one- dimensional partition {A1, A2, . . . , AB} and corresponding probabilities ρ = {ρ1, ρ2, . . . , ρB}.

• Section 4.4 in turn projects this onto the set of occupation numbers r = {r1, . . . , rB} with outcome space the universal set U (r) and multinomial probability of (4.4.5). • Section 4.5 will discuss how these concepts relate to physics.

• In Section 4.6 we develop the concept of waiting-time distributions which indicate that we could have used a different but equivalent normalisation of the sample space. The waiting-time identity itself will be used again later in the development.

• Section 4.7 develops the formalism of updating the probability from ρ to ρ0 when we introduce a constraint like a fixed “energy” G.

• Section 4.8 computes the exact predictions from state of knowledge constructed in the previous section.

• Section 4.9 uses the saddlepoint approximation to compute an approximation to the exact formulas of the previous section.

• Section 4.11 introduces the principle of Maximum Entropy which is justified by our saddlepoint approximation.

• Section 4.12 solves three example using the methods developed.

• Section 4.13 connects the Grand Canonical Ensemble with the Principle of Minimum Relative Entropy.

• Section 4.14 discuss the problems with this formalism and what we will do next.

Documento similar