3. Hans Reichenbach Propuesta y límites
3.2. Relación de la propuesta reichenbachiana con las propuestas de Prior
We wish to test the association between a given SNP and the secondary phenotype.
Specifically, our null and alternative hypothesis can be represented as:
H
0
: There is no association betweenX
j
andY.
H
a
: There is an association betweenX
j
andY.
The motivation for the proposed method is the following: If a given SNP is associated
with a secondary phenotype, we would expect that the vector of minor allele counts for the
SNP,X
j
, would be highly correlated with the secondary phenotype,Y. IfY
were permuted
to form a new vector,Y
∗
, this association would be eliminated. We would expect that the
(absolute) correlation betweenX
j
andY
would be greater than the correlation betweenX
j
andY
∗
. Thus, we may test the null hypothesis by comparing the correlation betweenX
j
and
Y
to the correlations betweenX
j
and a series ofY
∗
s, where eachY
∗
is a permutation ofY.
The procedure can be summarized as follows:
1. To account for the over-representation of disease cases in the data assign a weight of
w
0
= 1to controls and a weight ofw
1
=
(1−pp∗n)∗n0
1
to the cases.
2. Calculate the weighted correlation betweenX
j
andY. Denoted this correlation asR
j
.
4. For each
Y
b∗
,
b
= 1, . . . , B, calculate the weighted correlation between each
X
k
,
k
= 1, . . . , N, andY
b∗
. Denote this correlation asR
jb∗
5. The p-value for the test of the null hypothesis is then given by
p
j
=
1
N B
N
X
k=1
B
X
b=1
I(|R
∗kb
| ≥ |R
j
|)
(4.38)
This procedure is valid even if the secondary phenotype Y is dichotomous, since the
squared correlation betweenX
j
andY
is proportional to the Armitage trendχ
2
statistic in
this case (Price et al., 2006). Note that this test assumes that the distribution of theR
∗kb
’s
does not depend on
k. In practice, this assumption is unlikely to be perfectly satisfied.
However, this results in enormous computational savings. Under this assumption, B=25
provides a sufficient number of permutations to demonstrate that a SNP is associated with
a secondary phenotype at a Bonferroni-corrected threshold for genome-wide significance.
Without this assumption, millions of permutations would be required for each SNP, which is
intractable computationally.
Now suppose one wishes to evaluate the association between an allele countX
j
and
a secondary phenotypeY
after controlling for covariatesZ
=Z
1
, . . . , Z
K
, such as demo-
graphic covariates or eigenvectors corresponding to race or ancestry. The above procedure
can be modified as follows:
1. Perform a weighted regression ofX
j
onZand find the vectorX
0
j
of residuals from
the resulting model.
2. Similarly, perform a weighted regression ofY
onZ
and find the vectorY
0
of residuals
from the resulting model.
3. Apply the permutation test procedure above usingX
j0
andY
0
in place ofX
j
andY,
Note that this procedure requires one to regressX
j
on the covariates for each SNP. Thus,
a na¨ıve application of this procedure would require the computation ofN
regression models,
which would be computationally expensive. The required computing time can be signifi-
cantly reduced, however, by noting that each regression model has exactly the same covari-
ates. The only difference between these regression models is the outcome variableX
j
. Sup-
pose we are performing a weighted regression ofX
j
onZ. LetW
be a diagonal matrix of the
weights. Then the regression coefficients are given by(Z
T
W Z)
−1
Z
T
W X
j
, and estimated
values ofX
j
are therefore given byZ(Z
T
W Z)
−1
Z
T
W X
j
. LetH
=Z(Z
T
W Z)
−1
Z
T
W.
(This
H
is commonly known as the hat matrix.) Note that
H
does not on
X
j
. Thus, by
calculating and storing the hat matrixH, one may calculate the residuals of the regression
model to predict
X
j
based on
Z
by calculating
X
j
-HX
j
, which requires only a single
matrix multiplication rather than recomputing the entire regression model for each SNP. This
approach is likely to substantially reduce the computation time needed for the procedure.
Now suppose that some minor allele counts are missing at random for some of the
individuals in the study. We will show how using the Cholesky decomposition ofZ
T
W Zcan
speed the computation ofH. Note, we can determineL, the lower triangular matrix with real
and positive diagonal entries which solves the expressionZ
T
W Z
=L
T
L. Then we have that
(Z
T
W Z)
−1
=L
−1
(L
T
)
−1
. When we have missingness inX
j
we will need to recompute
H. LetZ
∗
represent the matrix of covariates with the individuals who have missing values
ofX
j
removed,W
∗
represent the matrix of weights with the individuals who have missing
values ofX
j
removed,zrepresent the matrix of covariates for individuals who have missing
values ofX
j
, andwrepresent the matrix of weights for individuals who have missing values
of
X
j
. Then
H
=
Z
∗
(Z
∗T
W
∗
Z
∗
)
−1
Z
∗T
W
∗
=
Z
∗
(Z
T
W Z
−z
T
wz)
−1
Z
∗T
W
∗
. We can
compute the down-dated Cholesky factor,
U
which solvesU
T
U
=
Z
T
W Z
−z
T
wz
in a
single step. Then we have thatH
=
Z
∗
U
−1
(U
T
)
−1
Z
∗T
W
∗
. Down-dating the Cholesky
In document
El problema de los tiempos verbales en la lógica temporal Límites de los análisis de Prior y Reichenbach
(página 59-61)