2. CAPÍTULO 2
2.19. ANÁLISIS DE CARGA
2.19.6. SÍMBOLOS Y NOTACIÓN DE CARGAS
RFLPs (Restriction Fragment Length Polymorphisms) were introduced in
1978 (Kan & Dozy, 1978), (Botstein et aL, 1980) and became the first genetic
markers to be used in a successful linkage (Huntington’s disease) (Gusella et al.,
1983). They are based on a single base pair change that creates or obliterates a
cleavage site for a specific restriction enzyme. The resulting variation between
individuals can be detected by digestion of the DNA by the appropriate restriction
enzyme. RFLPs are inherited as simple Mendelian codominant markers, which can
be readily identified in families. A disadvantage of RFLPs is that because they are
biallelic they are not very informative, having low heterozygosity (usually<0.40).
The heterozygosity of the genotyping markers was greatly increased by the
identification of VNTRs (Variable Number of Tandem Repeats) in 1985 (Nakamura
et al., 1987). This class of marker is made of a specific set of consensus sequences
that vary between 14 and 100 base pairs in length. They are remarkably polymorphic,
with a high heterozygosity rate (usually > 0.6) in the population. However, there is
only a small number of VNTRs available and their distribution is rather limited in the
genome, often tending to cluster toward human telomeres. In addition, their high
heterozygosity together with their large size make these markers difficult to genotype.
STRs (Short Tandem Repeats) or microsatellites were initially described by
Weber and May (Weber & May, 1989) and Litt and Luty (Litt & Luty, 1989). STRs
are distributed widely and evenly in the genome. They sometimes have high
heterozygosities (usually > 0.7) and are relatively easy to score. The number o f the
(trinucleotide) or four (tetranucleotide) bases. The dinucleotide (CA)n-(GT)n is the
most common repeat, with a highly polymorphic form of the repeat occurring
approximately every 0.4 cM. Their polymorphic status usually depends on two
factors: the size of the repeat and whether a perfect or imperfect repeat is present. A
marker having 15, or more, perfect (uninterrupted) repeat motifs is usually, but not
always, highly polymorphic (Weber, 1990). Following the success of the dinucleotide
repeat markers, other polymorphic microsatellites were isolated and characterized,
namely, tri- and tetra-nucleotide repeat markers, which have helped to further
saturate, the genome with additional polymorphic markers for use in genetic studies.
Finally, in recent years, attention has been focused on the use of single
nucleotide polymorphisms (SNPs) as genetic markers. They are the most common
type of human DNA variation and as the name suggests they represent a position at
which two alternative bases occur at an appreciable frequency (>1%) in the human
population (Wang et al.^ 1998). On average, they occur 1 per 300-1000 base pairs
(Collins et al., 1997). Although individual SNPs are less informative than typical
multi-allelic simple sequence length polymorphisms, they are more abundant and
their genotyping can be automated with the use of DNA chip-based microarrays
(Hacia et al., 1996), which allow the very rapid analysis of very large numbers of
SNPs. Large collections of mapped SNPs are being developed and will provide a
1.7.2 Param etric approach: Led score analysis
The term genetic linkage is used to describe the tendency of alleles from two
or more loci, which lie in close physical proximity to each other to segregate together
in a family. Therefore, the extent of linkage depends on distance, which in genetic
terms can be measured with the frequency of recombination, also known as the
recombination fraction, or theta (0) between loci. Hence, the further apart the loci, the
greater the recombination fraction and vice versa. Theta ranges from 0 for loci that
are completely linked to 0.5 for unlinked loci on the same or different chromosomes.
Lod score analysis is a likelihood-based parametric linkage approach to
estimate the recombination fraction and the significance of the evidence for linkage.
Two likelihoods (L) of observing a particular configuration of a disease locus and a
genetic marker locus in a pedigree data set are calculated: one under various degrees
of linkage over a selected range of recombination fractions (i.e., 0 < 0.5) and the other
under no linkage (i.e., 0 = 0.5). The logio of the ratio for these two likelihoods, also
known as lod score z(x) is used as a measure of support for linkage versus non
linkage (Morton, 1955):
z(x) = logio [ L(pedigree given 0=jc) / /.(pedigree given 0=0.5) ]
where jc is a particular value of theta. The value is termed the two-point lod when it
involves linkage between the disease locus and a genetic marker locus and multilocus,
or multipoint lod score when it involves linkage between a disease locus and several
linked marker loci. Lod scores across families can be summed but only at the same
recombination fraction and the overall lod score o f the data set is then examined.
Morton suggested that lods o f 3 or more are indicative of linkage because
theoretical considerations based on Bayesian probability and empirical evidence
positive. A lod score of -2 or less are indicative of non-linkage, while values between
3 and -2 are considered as inconclusive (Morton, 1955).
The lod score (or parametric) method of linkage analysis requires a genetic
model for the disease locus as well as specification of certain parameters such as:
recombination fraction, disease allele frequencies and penetrance values for each
possible disease genotype. Since a model of inheritance is unknown for
schizophrenia, researchers have tended to use a range of different models in which the
parameters are assumed. However, power to detect linkage is decreased when the
wrong assumptions are made. Several studies have shown that the problem with
applying a parametric method when the genetic model is unknown is not the false
conclusion of linkage but rather the increased probability of missing a linkage when it
is present (Clerget-Darpoux et a/., 1986).
One way to circumvent mis-specification of penetrance and/or the existence of
phenocopies is by performing an affecteds-only analysis in which one records
unaffected individuals as “phenotype unknown”. In this way maximum likelihood
estimates come from the most certain of the trait information, the phenotypes of
affected individuals. However, the inheritance pattern remains to be specified.
Another important issue when performing linkage analysis for complex traits
is locus heterogeneity. The lod score method provides statistical tests to calculate
heterogeneity (Ott, 1999). In this case, the likelihood ratio for linkage given
heterogeneity (hlod) may be written as
Z{ff) = logio [L{ 6, a)IL{6 =1/2, a = 0)]
where a the proportion of families being linked
The lod score method has several advantages when the mode of inheritance is
recombination fraction and it enables formal tests of genetic heterogeneity. However,
for complex genetic traits such as schizophrenia, with an unknown mode of
transmission it is necessary either to use arbitrary penetrances to evaluate linkage or
to use a “model free” method in which penetrance is varied automatically such that
the maximum lod is produced with gene frequency and penetrances that are
appropriate. Several investigators have argued for alternative approaches to the
detection of linkage described below.