Identificació del problema - Avaluació del rendiment

2.4 Avaluació del rendiment

2.4.1 Identificació del problema

Consider now the following simulation setting. For each i = 1, . . . , 8, covariates xi1 and

xi2 are independently drawn from the N (0, 1) and Bern(0.6) distributions, respectively.

Responses yi are thus generated as realizations of Poisson random variables with mean

µi = exp(β01+ β02xi1+ β03xi2), where β01= 1, β02= 1 and β03= 2. Datasets of larger

size n = 16, 32, 64 are also created using the same original set of covariates. Rejection probabilities of the usual tests for H0 : βj = β0j (j = 1, 2, 3) at several significance

levels are then estimated by means of 5000 iterations for each sample size n. Table 2.4 presents such results for theoretical values of α = 0.01, 0.05, whereas Table 2.5 deals with greater nominal levels α = 0.1, 0.2. Under this scenario, the number of simulation trials is increased because less variation in testing performance may be observed among the various statistics. For instance, unlike what seen in the studies concerning the Gamma regression, here Z_Pj,∗ is not outclassing the other competitors. Moreover, even the standard Wald test proves to be quite reliable, thus the room for refinement due to the location adjustment is not as large as before. Nevertheless, the experiment suggests that some profitable effects are still appreciable, especially as α grows and also for moderate values of n.

2.6 Discussion and further work

The fundamental idea behind this chapter, introduced in Section 2.1, has been to im- prove first-order Wald inference on small-to-moderate samples in regression settings by adjusting the null moments of the z-statistic. Because such a method is not guaranteed to succeed in increasing the overall agreement between the null distribution of the pivot and the standard normal distribution, several scenarios were taken into consideration to verify the actual usefulness of this approach.

Section 2.2 dealt with some motivating examples of our research. In simple frame- works with scalar global parameter, obtaining explicit asymptotic expansions for the mean and variance of the z-statistic was shown to be not so demanding. The location- scale adjustment seems particularly effective in the exponential case: the normal approximation to the distribution of the adjusted z-statistic is critically improved with respect to that of the ordinary version and is even more accurate than that of the score statistic, for all the sample sizes considered. In the Poisson setting the location-scale adjusted z-statistic performs in a dubious way, while under the logistic model the corresponding test confirmed to be typically more reliable than the ordinary one, although limitations of its performance connected with the correction in variance cannot be denied.

Table 2.4: Empirical rejection probabilities at nominal levels α = 0.01, 0.05 of the two-sided tests related to bTj_{, its location adjusted version b}_Tj,∗_{, the profile score}

statistic Z_uPj , the profile likelihood ratio statistic Z_Pj and its modification Z_Pj,∗ (j = 1, 2, 3) in the Poisson log-linear model, estimated by a study based on 5000 simulated datasets of size n = 8, 16, 32, 64. α = 0.01 α = 0.05 n = 8 Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ j = 1 0.011 0.011 0.011 0.011 0.012 0.048 0.050 0.049 0.053 0.053 j = 2 0.009 0.010 0.010 0.010 0.012 0.048 0.049 0.049 0.051 0.053 j = 3 0.009 0.009 0.009 0.010 0.014 0.047 0.048 0.049 0.048 0.053 n = 16 Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ j = 1 0.009 0.009 0.010 0.010 0.014 0.043 0.044 0.044 0.046 0.048 j = 2 0.008 0.008 0.008 0.007 0.011 0.045 0.046 0.046 0.045 0.049 j = 3 0.010 0.010 0.010 0.011 0.014 0.047 0.047 0.047 0.046 0.049 n = 32 Tbj Tbj,∗ Z j uP Z j P Z j,∗ P Tbj Tbj,∗ Z j uP Z j P Z j,∗ P j = 1 0.008 0.008 0.008 0.008 0.012 0.052 0.053 0.053 0.051 0.057 j = 2 0.008 0.008 0.008 0.007 0.013 0.044 0.044 0.044 0.046 0.050 j = 3 0.010 0.011 0.011 0.010 0.016 0.048 0.048 0.048 0.047 0.054 n = 64 Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ j = 1 0.010 0.009 0.010 0.010 0.016 0.044 0.044 0.045 0.045 0.051 j = 2 0.013 0.013 0.013 0.012 0.018 0.052 0.052 0.052 0.052 0.058 j = 3 0.012 0.012 0.012 0.012 0.016 0.047 0.047 0.047 0.048 0.052

In Section 2.3 a convenient way to implement the location adjustment of the z- statistic under general regression scenarios was presented. The core intuition of viewing the combinant as an estimator of a reparametrization permits the proposed approach to enjoy the simplicity of original Wald-type inference. Indeed, the necessary ingredients to compute the location-adjusted z-statistic are easily obtainable from standard output of routines for fitting regression models. As a result, the computational effort implied by the procedure is equal to that implied by classical z-testing. We remark also that the same basic technique may be adopted to adjust z-statistics which use the observed information for the estimates’ standard errors.

In Section 2.4 advantage was taken again of the single-parameter setting in order to study some theoretical properties of the location adjusted z-statistic and to evaluate its testing performance in a realistic situation. The asymptotic comparison between the two versions of the z-statistic did not resulted in a comprehensive pattern of difference in variability. For sure this analysis deserves to be further developed, both analytically and

62 Section 2.6 - Discussion and further work

Table 2.5: Empirical rejection probabilities at nominal levels α = 0.1, 0.2 of the two-sided tests related to bTj_{, its location adjusted version b}_Tj,∗_{, the score statistic}

Z_uPj , the likelihood ratio statistic Z_Pj and its modification Z_Pj,∗ (j = 1, 2, 3) in the Poisson log-linear model, estimated by a study based on 5000 simulated datasets of size n = 8, 16, 32, 64. α = 0.1 α = 0.2 n = 8 Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ j = 1 0.100 0.101 0.102 0.105 0.106 0.206 0.208 0.207 0.212 0.210 j = 2 0.099 0.101 0.101 0.101 0.103 0.193 0.195 0.194 0.197 0.198 j = 3 0.095 0.098 0.097 0.102 0.104 0.193 0.196 0.194 0.198 0.201 n = 16 Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ j = 1 0.092 0.094 0.093 0.092 0.097 0.193 0.194 0.194 0.193 0.197 j = 2 0.090 0.091 0.090 0.092 0.096 0.185 0.187 0.186 0.185 0.191 j = 3 0.096 0.097 0.097 0.099 0.101 0.198 0.200 0.199 0.200 0.202 n = 32 Tbj Tbj,∗ Z j uP Z j P Z j,∗ P Tbj Tbj,∗ Z j uP Z j P Z j,∗ P j = 1 0.100 0.100 0.100 0.101 0.106 0.197 0.198 0.197 0.198 0.204 j = 2 0.096 0.097 0.097 0.096 0.101 0.197 0.198 0.197 0.198 0.202 j = 3 0.099 0.100 0.099 0.102 0.108 0.202 0.204 0.203 0.204 0.210 n = 64 Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ Tbj Tbj,∗ Z_uPj Z_Pj Z_Pj,∗ j = 1 0.092 0.092 0.092 0.093 0.100 0.194 0.194 0.194 0.196 0.202 j = 2 0.099 0.099 0.099 0.099 0.105 0.199 0.199 0.199 0.200 0.206 j = 3 0.093 0.094 0.093 0.093 0.097 0.192 0.192 0.192 0.191 0.196

empirically. Within the problem of inference on a binomial proportion, the behaviour of the location adjusted z-statistic was not found as satisfying as in the one-parameter models examined in Section 2.2.4. Determining whether the presence of a bounded parameter space may reduce the efficacy of the suggested approach appears then helpful. Section 2.5 was devoted instead to the location adjustment of z-statistics in GLMs. Such prominent modeling framework is in fact especially suited to the application of our method. Among the practical aspects which contribute to further ease the steps of calculation, the existence of closed-form expressions for the bias of ML estimators is probably the most notable (Cordeiro and McCullagh, 1991).

The performance of the adjusted z-test in this context was illustrated through some simulation studies. Results relating to the Gamma regression are very remarkable: the location adjusted z-statistic always exhibits more adequate rejection probabilities than its direct competitor. For smaller samples, the adjusted z-test seems even more reliable than the profile likelihood ratio test and, in some cases, than the profile score test. Notice

that, contrary to our proposal, both the latter require the constrained ML fit under the null hypothesis in order to be obtained. The testing accuracy of the location adjusted z- statistic was also shown to be comparable to that of higher-order tests when a bootstrap is employed for correcting its scale. Beyond any doubt, the bootstrap implementation makes the method much more intensive from a computational standpoint. It would certainly be preferable to find a simpler way to perform the scale adjustment of the z-statistic, similar to that used for centering its location.

Under the Poisson log-linear model, simulation evidence in support of the better performance of the location adjusted z-statistic was not as strong as for the Gamma regression case. However, the minor discrepancies in the empirical rejection probabilities of the two variants of the z-test allow to conclude that the adjustment in location is rather effective in this setting as well.

Of course, both the findings and the limitations of our study give rise to the need for further work into this subject. Some open problems have already been mentioned above, but there are more questions still left unanswered. Below, we delineate the main future directions of research in the form of a list:

i) Elaborate on the analysis in Section 2.4.1 by comparing the variances of the standard and location adjusted z-statistics in special simple model settings, like those of Section 2.2.4.

ii) Extend the variance analysis in Section 2.4.1 to the case of multidimensional parameter.

iii) Derive asymptotic (e.g. Edgeworth, Cornish-Fisher) expansions for the distributions of the standard and location adjusted z-statistics to formally establish whether the normal approximation is improved by the adjustment in location. iv) Develop a power analysis to compare the distributions of the standard and location

adjusted z-statistics under the alternative hypothesis.

v) Perform other Monte Carlo experiments, involving both real and simulated datasets, to empirically test the relative performance in the GLMs framework of the standard and location adjusted z-statistics, even with regard to the other likelihood- based pivots considered in Section 2.5.2. In particular, consider Poisson and binomial distributions of the response variable.

vi) Derive the location adjustment and empirically test the relative performance of the standard and location adjusted z-statistics under general regression scenarios, like the Cox proportional hazards and Beta regression models.

64 Section 2.6 - Discussion and further work vii) Explore the possibility of implementing a fairly simple scale adjustment of the

z-statistic along with the proposed correction in location.

viii) Investigate ways to adopt the same general approach with other test statistics, e.g. log-likelihood ratio or score statistics.

ix) Consider the potential application of the methodology suggested to p-values and/or rejection probabilities of the z-statistic, rather than to the pivot itself. In fact, at a given significance level of the test, such quantities may be viewed in their turn as model reparametrizations.

Chapter 3 Monte Carlo modified profile

likelihood for clustered data

3.1 Introduction

The modified profile likelihood (MPL) (Barndorff-Nielsen, 1983) was introduced as prime example among adjusted profile likelihoods in Section 1.3.4. Unfortunately, the great beneficial impact of its employment can be directly observed only within the fam- ilies of full exponential and composite group models, where the explicit derivation of an ancillary statistic is either unnecessary or practically possible.

In Chapter 1, we saw that the approximation owed to Severini (1998b) to this pseudo- likelihood function helps to overcome most of those computational difficulties, leaning on expected values asymptotically equivalent to the sample space derivatives involved in the original version of the MPL. Such expedient has thus sensitively extended the scope of this inferential instrument. Nevertheless, it is not complicated to check that covariances between score components like those present in Severini’s modification may still not be readily available for a number of statistical problems.

The increasing complexity of phenomena nowadays dealt with is probably the main reason of the unquestioned current dissemination in all applied areas of clustered data, also known as grouped data, longitudinal data, stratified data or panel data (Hsiao, 2007). In Section 1.4 emphasis was placed on the fact that, due to their singular struc- ture, datasets under those denominations are typically analyzed through statistical models intrinsically connoted by the incidental parameters problem. This character, more specifically, has to do with the usual choice of capturing the unobserved heterogeneity across groups via cluster-specific nuisance parameters, commonly named individual effects. Specifications of such type, especially popular in econometrics, are referred to as

66 Section 3.2 - Monte Carlo approximation to Severini’s modified profile likelihood

In document Inspecció interactiva i immersiva de models volumètrics. Aplicació diagnosi mèdica (página 44-48)