ESPECIFICACIONES TÉCNICAS PARA LA CONTRATACIÓN DE SERVICIOS DE HOSPITALIZACIÓN GENERAL DE PEDIATRÍA Y CIRUGÍA PEDIÁTRICA, PARA EL
7. Lineamientos de Referencia y Contrarreferencia:
5.1 INTRODUCTION
In previous chapters, we worked with the linear regression model: yi Dþ1Cþ2xi 2C Ð Ð Ð CþkxikC"i
where data was available on i D 1; : : : ; N individuals. This model is useful not only when the relationship between the dependent and explanatory variables is a linear one, but also in cases where it can be transformed to linearity. For instance, the Cobb–Douglas production function relating an output, y, to inputs x2; : : : ; xk is of the form
y DÞ1xþ2 2 ; : : : ; x
þk k
If we take logs of both sides of this equation and add an error term, we obtain a regression model:
ln.yi/ D þ1Cþ2ln.xi 2/ C Ð Ð Ð C þkln.xik/ C "i
where þ1Dln.Þ1/. This specification is now linear in logs of the dependent and explanatory variables and, with this small difference, all the techniques of the previous chapters apply. The translog production function is another example of a nonlinear relationship which can be transformed to linearity.
There are, however, some functional forms which cannot be transformed to linearity. An example of an intrinsically nonlinear functional form is the constant elasticity of substitution (CES) production function, which is of the form
yi D k X j D1 jxi jkC1 ! 1 kC1
In this chapter, we consider Bayesian inference in regression models where the explanatory variables enter in an intrinsically nonlinear way. The empirical illus- tration will focus on the CES production function and, for this case, our nonlinear
regression model will have the form: yi D k X j D1 jxi jkC1 ! 1 kC1 C"i (5.1)
We use the same notation as before (e.g., see the discussion at the beginning of Chapter 3), and let " and y be N-vectors stacking the errors and observa- tions of the dependent variable, respectively, and let X be an N ð k matrix stacking the observations of the k explanatory variables. We make the standard assumptions that:
1. " is N.0N; h1IN/.
2. All elements of X are either fixed (i.e. not random variables) or, if they are random variables, they are independent of all elements of" with a probability density function p.Xj½/, where ½ is a vector of parameters that does not include any of the other parameters in the model.
The basic ideas discussed in this chapter will hold for the general nonlinear regression model:
yi D f.Xi; / C "i
where Xi is the i th row of X , f.Ð/ is a function which depends upon Xi and a vector of parameters, . With some abuse of notation, we write this model in matrix form as:
y D f.X; / C " (5.2)
where f.X; / is now an N-vector of functions with ith element given by f.Xi; /. The exact implementation of the posterior simulation algorithm will depend upon the form of f.Ð/ and, hence, we discuss basic concepts using (5.2) before discussing (5.1).
The nonlinear regression model is an important one in its own right. However, we also discuss it here, since it will give us a chance to introduce a number of techniques which are applicable in virtually any model. The linear regression model was a very special one in the sense that it was possible, in some cases, to obtain analytical posterior results (see Chapters 2 and 3). Even with priors which preclude the availability of analytical results, some special techniques are available for the Normal linear regression model (e.g. Gibbs sampling and the Savage–Dickey density ratio discussed in Chapter 4). However, many models do not allow for such specialized methods to be used and it is important to develop generic methods which can be used in any model. The nonlinear regression model allows us to introduce such generic methods in a context which is only a slight extension on our familiar linear regression model. With regards to posterior simulation we introduce a very important class of posterior simulators called the Metropolis–Hastings algorithms. These algorithms will be used in later chapters.
We will also introduce a generic method for calculating the marginal likelihood developed in Gelfand and Dey (1994), and a metric for evaluating the fit of a model called the posterior predictive p-value.
5.2 THE LIKELIHOOD FUNCTION
Using the definition of the multivariate Normal density, we can write the likeli- hood function of the nonlinear regression model as
p.yj; h/ D h N 2 .2³/N2 ² exp h 2fy f.X; /g 0fy f.X; /g ½¦ (5.3) With the linear regression model, we were able to write this expression in terms of OLS quantities which suggested a form for a natural conjugate prior (see (3.7)). Here, no such simplification exists unless f.Ð/ takes very specific forms.
5.3 THE PRIOR
Prior choice will depend upon what f.Ð/ is and how is interpreted. For instance, with the CES production function in (5.1), kC1 is related to the elasticity of substitution between inputs. The researcher would likely have prior information about what plausible values for this parameter might be. Hence, prior elicitation is likely to be very dependent on the particular empirical context. In this section, some of the discussion proceeds at a completely general level, with the prior simply denoted by p.; h/, and some of the discussion uses a prior which was noninformative for the linear regression model:
p.; h/ / 1
h (5.4)
This prior is Uniform for and ln.h/. In many cases, this might be a sensible noninformative prior for the parameters of the nonlinear regression model.
5.4 THE POSTERIOR
The posterior density is proportional to the likelihood times prior and can be written as p.; hjy/ / p.; h/ h N 2 .2³/N 2 ² exp h 2fy f.X; /g 0f y f.X; /g ½¦ (5.5) In general, there is no way to simplify this expression, which will depend upon the precise forms for p.; h/ and f .Ð/ and does not take the form of any well-known density. When the noninformative prior in (5.4) is used, the error precision, h,
can be integrated out analytically in a step analogous to that required to derive (3.14). The resulting marginal posterior for is
p. jy/ / [fy f .X; /g0fy f.X; /g]N2 (5.6) In the case where f.Ð/ was linear, this expression could be rearranged to be put in the form of the kernel of a t distribution, but here it does not take any convenient form.