Consideraciones finales del Capítulo

CAPÍTULO 2: DISEÑO Y EVALUACIÓN DEL AULA VIRTUAL DE LA ASIGNATURA

2.7 Consideraciones finales del Capítulo

In astrophysics the modelling of observational data usually needs free parameters and the more complex the model, the more expensive is the computation of the best fit. The probabilistic data analysis in the Markov Chain Monte Carlo (MCMC) framework has become one of the preferred methods for the data analysis in astrophysics. The Bayesian inference has the advantage over frequentist methods because it allows to take into account prior information on the parameters of the model along with the data to fit. Furthermore, it provides practical information of the parameters in the model, e.g. how they are are correlated (e.g. no correlated, non-linear correlation, etc), the distribution from which formal errors can be calculated (e.g. one sigma error from a Normal distribution). Here, I will explain the MCMC method which involves the Bayesian inference, i.e. describes the probability of an event, based on prior knowledge of the conditions that are related to the event.

If there is a set of observed data,D, that can be described by a model,Θ, then the

distribution of the parameters of the model that is consistent with the dataset is given by theposterior probability function. Using Bayes theorem, the MCMC method aims to draw

samplesΘ_j from the posterior probability density,

P(Θ|D)= P(Θ)P(D|Θ)

P(D) . (3.1)

P(Θ)is known as the prior distribution which encodes every piece of information about the parameters such as results from previous experiments or physically acceptable values (e.g. the mass can never take negative values). P(D) is the normalization constant known as evidence which is usually very expensive to compute since the integration is carried out over all possible parameter values,∫

θP(D|θ)P(θ)dθ. However, I will explain later the reason why it is usually not necessary to calculate the evidence, meaning that the posterior,P(Θ|D), can be sampled without computing P(D). FinallyP(D|Θ) is the likelihood function, which is the probability distribution over the dataset, i.e. how it is anticipate that the data are distributed.

Figure 3.1: Y is a proposed position for walker Xk using a stretch move along the line

connecting the walker with a complementary helperXj from the ensemble. The light grey

dots represent the position of other walkers in the ensemble which do not participate in the move.

The technique of MCMC consists in sampling the parameter space aiming to find the region that maximizes the posterior. The samplers, calledwalkers, generate random walks

in this parameter space. The method used in this thesis to move the walkers is thestretch moveensemble sampler method (Goodman & Weare, 2010). This technique proposes new

positions for the walkers using the current walker positions and performs an accept/reject analysis.

In an ensemble ofMwalkers,X®(t)=[X₁(t),X₂(t)...,X_M(t)] varies in time,t. The sug- gested positions of the walkers in the ensemble are invariant under an affine transformation. The property of affinity lies in connecting two affine spaces for which the transformation preserves the ratios of distances between points lying on a straight line. An example of an affine transformation is a rotation of a image. The affine invariant of the walkerXk(t)in the

ensemble is a stretch move, along the line connecting an auxiliary walker Xj(t), as shown

in Figure 3.1. The move is given by:

X_k(t) −→Y = X_j+Z(X_k(t) −X_j) (3.2) where Z is the stretching variable (Z = 1 means no change) which is drawn from a distributiong(Z)which satisfies the symmetric condition: g(1/Z)= Zg(Z). Consequently the move is symmetric, meaning that the probability of Xj −→ Xk is the same as the

probability ofXk −→Xj. The particular distribution defined by Goodman & Weare (2010)

is based on an adjustable scale parametera(set to 2 according to Goodman & Weare 2010) and it has the form:

g(Z) ∝        1 √ Z ifZ ∈ 1 a,a 0 otherwise (3.3)

Finally the proposed position in the parameter space is analysed according to the acceptance probability. The acceptance probability is based on the ratio between the posterior probabilities of the proposed and the current positions, and will accept the lower value in

min 1,Zn−1 P(Y) P(Xk(t)) (3.4) The factorZn−1corresponds to the fact that the proposed position is chosen from a one-dimensional subset of then-dimensional space. Above I mentioned that the evidence, P(D), does not need to be computed. The reason can easily be understood from equation 3.4. The evidence is based only on the data and will therefore be the same in both posterior probabilities, cancelling out from the acceptance ratio.

The procedure is repeated, and if the choice of the model appropriately describes the data, the chain of walkers will converge to a stationary state, which represents the largest posterior in that parameter space. However, it is worth note few points that are unclear how to deal with: it is unclear how long the chain should run to achieve a convergence, and the posterior distributions can be strongly dependent on the assumed priors, in which case an inadequate prior can mislead result of the parameters.

3.2.1 Example: white dwarf radial velocity of QS Vir

To illustrate the effectiveness of the MCMC fitting technique, I will make use of radial velocity of the white dwarf in QS Vir as an example. The model for the radial velocity of the white dwarf can be described by

V_tan=γ_sys+K_WD× sin(2π φ) (3.5)

where γ_sys, K_WD, and φ are the systemic velocity, the amplitude of the radial velocity and the orbital phase, respectively. In section 2.3.3 I measured a systemic velocity of

γ_sys = 39±11 km s−1 and a velocity amplitude ofK_WD = 147±16 km s−1from the COS spectroscopy. I am going to assume that the mean of these values are thetrueparameters

and generate artificial data which I will fit using MCMC. If the probabilistic concept behind MCMC method is correct, then the values of the parameters obtained from the MCMC should be very similar to those measured from theHSTdata.

Figure 3.2: Synthetic data for the radial velocity of the white dwarf in QS Vir including artificial noise (dots). The blue line corresponds to thetruevalues (see text for details) of

the radial velocityγ_sys=39 km s−1_and_K

WD=147 km s−1. The red line represents the best

MCMC fit to the synthetic data,K_sys=39.01±0.02 km s−1andK_WD=147.01±0.03 km s−1. To generate the data, D_i, I selected N = 1000 random values from the Gaussian distributions of the γ_sys and the K_WD in equation 3.5 (black dots in Figure 3.2). This example, and in general the spectroscopic data in this thesis, consists ofN data points that have uncertainties, σi, which follow a Gaussian distribution. Therefore it is common to

work in logarithmic space. The Gaussian likelihood distribution will be given by lnP(D|Θ)= N Õ i −1 2ln(2πσi2) − 1 2 (D_i−Θ_i) σ_i2 . (3.6)

Therefore, the posterior distribution to be maximized is, lnP(Θ|D) ∝lnP(Θ) − 1 2 N Õ i (Di−Θi)2 σ2 i . (3.7)

For the case of QS Vir, I set a flat logarithmic prior that constrains the parameters to be always positive, i.e.

Figure 3.3: This plot is commonly called “Burn in” diagram and illustrates the convergence of the parametersγ_sysandK_WDfor the radial velocity of QS Vir. Also shown is the χ2 in comparison with the posterior probability.

lnP(Θ)=        0 if parameter > 0 -1030 _otherwise_. (3.8)

The modelΘ, is given by equation 3.5. The errors,σ, will be the difference between

the synthetic data and the true values (blue line in Figure 3.2). Therefore, data points that are farther from the blue line will have less priority to be fitted than those that are closer, increasing the likeliness to find the true values.

Figure 3.4: Corner plot displaying the Gaussian distribution of the best-fit parameters with values ofK_sys =39.01±0.02 km s−1_and_K

WD=147.01±0.03 km s−1. The lines represent

thetrue values (see text for details) K_sys = 39 km s−1 and K_WD = 147 km s−1. The two parameters are strongly correlated.

I set 200 walkers to sample the parameter space composed by two parameters,K_sys andK_WD, iterating 500 times. The walkers sample a wide range of initial values but converge quickly within less than 100 iterations (Figure 3.3). To obtain the best-fit parameters, the sequence is cut-off in the region where the walkers have not converged, and the remaining burnt-in sequence is projected into histograms. The mean and standard deviation in these histograms correspond to the best-fit values. Figure 3.4 shows the distribution of the parameters from the fit to the synthetic radial velocity data for QS Vir. They can be described by Gaussians: K_sys = 39.01±0.02 km s−1 and K_WD = 147.01±0.03 km s−1, which are extremely close to the true values.

CAPÍTULO 2: DISEÑO Y EVALUACIÓN DEL AULA VIRTUAL DE LA ASIGNATURA

2.7 Consideraciones finales del Capítulo

3.3

White dwarf

T

and

logg

from fits to ultraviolet spec-