Galaxy bias - Statistics of galaxy clustering

1.10 Statistics of galaxy clustering

1.10.5 Galaxy bias

variable changes its interpretation if the outcome variable is transformed and more critically may depend drastically on which other explanatory variables are used. Further, if different reasonably well-fitting models give rather different conclusions it is important to know and report this.

5.14 Consistency of data and prior

Especially when a prior distribution represents potentially new important information, it is in principle desirable to examine the mutual consistency of information provided by the data and the prior. This will not be possible for aspects of the model for which there is little or no information in the data, but in general some comparison will be possible.

A serious discrepancy may mean that the prior is wrong, i.e., does not cor-respond to subject-matter reality, that the data are seriously biased or that the play of chance has been extreme.

Often comparison of the two sources of information can be informal but for a more formal approach it is necessary to find a distribution of observable quant-ities that is exactly or largely free of unknown parameters. Such a distribution is the marginal distribution of a statistic implied by likelihood and prior.

For example in the discussion of Example1.1about the normal mean, the marginal distribution implied by the likelihood and the normal prior is that

¯Y − m

√(σ₀²/n + v) (5.18)

has a standard normal distribution. That is, discrepancy is indicated whenever

¯Y and the prior mean m are sufficiently far apart. Note that a very flat prior, i.e., one with extremely large v, will not be found discrepant.

A similar argument applies to other exponential family situations. Thus a binomial model with a beta conjugate prior implies a beta-binomial distribution for the number of successes, this distribution having known parameters and thus in principle allowing exact evaluation of the statistical significance of any departure.

5.15 Relevance of frequentist assessment

A key issue concerns the circumstances under which the frequentist interpret-ation of a p-value or confidence interval is relevant for a particular situinterpret-ation under study. In some rather general sense, following procedures that are in

error relatively infrequently in the long run is some assurance for the particular case but one would like to go beyond that and be more specific.

Appropriate conditioning is one aspect already discussed. Another is the following. As with other measuring devices a p-value is calibrated in terms of the consequences of using it; also there is an implicit protocol for application that hinges on ensuring the relevance of the calibration procedure.

This protocol is essentially as follows. There is a question. A model is for-mulated. To help answer the question it may be that the hypothesisψ = ψ0

is considered. A test statistic is chosen. Data become available. The test stat-istic is calculated. In fact it will be relatively rare that this protocol is followed precisely in the form just set out.

It would be unusual and indeed unwise to start such an analysis without some preliminary checks of data completeness and quality. Corrections to the data would typically not affect the relevance of the protocol, but the preliminary study might suggest some modification of the proposed analysis. For example:

• some subsidiary aspects of the model might need amendment, for example it might be desirable to allow systematic changes in variance in a regression model;

• it might be desirable to change the precise formulation of the research question, for example by changing a specification of how E(Y) depends on explanatory variables to one in which E(log Y) is considered instead;

• a large number of tests of distinct hypotheses might be done, all showing insignificant departures discarded, while reporting only those showing significant departure from the relevant null hypotheses;

• occasionally the whole focus of the investigation might change to the study of some unexpected aspect which nullified the original intention.

The third of these represents poor reporting practice but does correspond roughly to what sometimes happens in less blatant form.

It is difficult to specify criteria under which the departure from the protocol is so severe that the corresponding procedure is useless or misleading. Of the above instances, in the first two a standard analysis of the new model is likely to be reasonably satisfactory. In a qualitative way they correspond to fitting a broader model than the one originally contemplated and provided the fitting criterion is not, for example, chosen to maximize statistical significance, the results will be reasonably appropriate. That is not the case, however, for the last two possibilities.

Example 5.10. Selective reporting. Suppose that m independent sets of data are available each with its appropriate null hypothesis. Each is tested and pis the

5.15 Relevance of frequentist assessment 87

smallest p-value achieved and Hthe corresponding null hypothesis. Suppose that only Hand pare reported. If say m= 100 it would be no surprise to find a pas small as 0.01.

In this particular case the procedure followed is sufficiently clearly specified that a new and totally relevant protocol can be formulated. The test is based on the smallest of m independently distributed random variables, all with a uniform distribution under the overall null hypothesis of no effects. If the corresponding random variable is P, then

P(P> x) = (1 − x)^m, (5.19) because in order that P> x it is necessary and sufficient that all individual ps exceed x. Thus the significance level to be attached to punder this scheme of investigation is

1− (1 − p)^m (5.20)

and if pis small and m not too large this will be close to mp.

This procedure, named after Bonferroni, gives a quite widely useful way of adjusting simple p-values.

There is an extensive set of procedures, of which this is the simplest and most important, known under the name of multiple comparison methods. The name is, however, somewhat of a misnomer. Many investigations set out to answer several questions via one set of data and difficulties arise not so much from dealing with several questions, each answer with its measure of uncertainty, but rather from selecting one or a small number of questions on the basis of the apparent answer.

The corresponding Bayesian analysis requires a much more detailed spe-cification and this point indeed illustrates one difference between the broad approaches. It would be naive to think that all problems deserve very detailed specification, even in those cases where it is possible in principle to set out such a formulation with any realism. Here for a Bayesian treatment it is necessary to specify both the prior probability that any particular set of null hypotheses is false and the prior distributions holding under the relevant alternative. Form-ally there may be no particular problem in doing this but for the prior to reflect genuine knowledge considerable detail would be involved. Given this formula-tion, the posterior distribution over the set of m hypotheses and corresponding alternatives is in principle determined. In particular the posterior probability of any particular hypothesis can be found.

Suppose now that only the set of data with largest apparent effect is con-sidered. It would seem that if the prior distributions involve strong assumptions

of independence among the sets of data, and of course that is not essential, then the information that the chosen set has the largest effect is irrelevant, the posterior distribution is unchanged, i.e., no direct allowance for selection is required.

A resolution of the apparent conflict with the frequentist discussion is, how-ever, obtained if it is reasonable to argue that such a strategy of analysis is most likely to be used, if at all, when most of the individual null hypotheses are essentially correct. That is, with m hypotheses under examination the prior probability of any one being false may be approximatelyν0/m, where ν0may be treated as constant as m varies. Indeedν0might be approximately 1, so that the prior expectation is that one of the null hypotheses is false. The dependence on m is thereby restored.

An important issue here is that to the extent that the statistical analysis is concerned with the relation between data and a hypothesis about that data, it might seem that the relation should be unaffected by how the hypothesis came to be considered. Indeed a different investigator who had focused on the particular hypothesis H from the start would be entitled to use p. But if simple significance tests are to be used as an aid to interpretation and discov-ery in somewhat exploratory situations, it is clear that some such precaution as the use of (5.20) is essential to ensure relevance to the analysis as imple-mented and to avoid the occurrence of systematically wrong answers. In fact, more broadly, ingenious investigators often have little difficulty in producing convincing after-the-event explanations of surprising conclusions that were unanticipated beforehand but which retrospectively may even have high prior probability; see Section5.10. Such ingenuity is certainly important but explan-ations produced by that route have, in the short term at least, different status from those put forward beforehand.

In document large scale structure of the Universe (página 42-133)