Current advances in molecular methods including next‐generation sequencing (NGS) will enhance the discovery of Y‐specific genetic markers. NGS refers to a group of alternative DNA sequencing technologies that are to the classical Sanger sequencing, and can generate hundreds of thousands of sequence reads at one time ‐ thus increasing sequence capacity at an unprecedented rate (Hudson 2008; Shendure & Ji 2008). In this section, we describe unpublished and promising strategies for the development of Y‐linked genetic markers in non‐model organisms (bold pathways in Figure 2). Most of these strategies will benefit from NGS techniques by generating large amounts of Y sequence data and as such will increase the likelihood of discovering male‐specific markers. Consequently, we focus in this section on the primary methodological step of obtaining longer DNA fragments for initial sequencing. Similar to the current methods, the emerging strategies can be divided into strategies screening simultaneously several individuals for sequence polymorphisms or strategies exploring large
Chapter 2
50
sequence data from physically isolated material and libraries for microsatellites repeat motifs.
Beyond YCATS
Compared to the traditional YCATS approach (Hellborg & Ellegren 2003), longer stretches comprising several kb of Y‐specific sequence data can be attained using long‐range PCR to amplify single‐copied MSY‐specific genes (Wandeler and Camenisch, in prep.). This strategy fulfils instantly two of the three criteria for successful Y‐marker development: male specificity and single‐copy amplification, whilst the likelihood of finding sequence polymorphism among amplicons from different individuals is increased by obtaining longer sequence information. Additionally, intronic sequences may also contain polymorphic microsatellites (Luo et al. 2007; Wandeler & Camenisch in prep.). In Table 2, we present a list of single‐copied MSY‐linked genes as potential targets for this strategy. Publicly available Y chromosome reference sequences of MSY‐linked genes from mouse, chimpanzee and human are used to estimate the proximate location of exonic and intronic sequences for one or several long‐range PCR assays. Initial re‐sequencing of short fragments of the selected gene‐regions provides the necessary sequence information to design species‐ and Y‐specific long‐range PCR primers. Detailed sequence information is important for primer design, as in contrast to conventional PCR, long‐range PCR requires perfectly matching primers. Re‐sequencing can be done by applying newly‐designed exonic primers or known conserved YCATS primers. Finally, male‐ specificity of long‐range amplicons is verified and a few selected individuals are sequenced using Sanger sequencing. Alternatively, more amplicons from a large number of individuals can be pooled and sequenced simultaneously or sequenced using a parallel tagged sequencing approach on a NGS platform (Meyer et al., 2008). Y walking We consider directional Y‐chromosome walking as a promising strategy for generating large amounts of Y‐specific sequence information. In this strategy, a known Y‐linked sequence can be used as a starting point for sequencing into unknown flanking regions. Any DNA sequence tested for male‐specificity including microsatellite flanking regions or sequences obtained by YCATS can serve as potential starting point. Again, by targeting regions near or within single‐ copied MSY‐specific genes (Table 2), this strategy will likely provide Y‐specific and single‐ copied amplicons. Several different genome‐walking methodologies predominantly based on restriction digestion and PCR have been described, including inverse PCR (Triglia et al. 1988), ligation‐mediated PCR (Rosenthal et al. 1990), and randomly primed PCR (Parker et al. 1991). Recently, these methods have been considerably improved (Reddy et al. 2008; Rampias et al. 2009; Tsuchiya et al. 2009). The sequence information gained by genome‐walking methods is usually determined by the frequency of the cutting sites of the applied restriction enzymes. Since the number of Y‐specific starting sequences can be a limiting factor, it is advantageous to maximize product size. However, as most methods are PCR based, product length barely exceeds 2 kb, although it should be possible to achieve considerably longer amplicons by
long‐range PCR assays. Similar to the extended YCATS methods described above, amplicons can subsequently be sequenced by either conventional Sanger sequencing or by using a NGS platform.
Genomic Y sequence data
In species with no or very limited Y‐sequence information, genomic Y data can be obtained by combining current methodological strategies with NGS technologies (Figure 2). The obvious advantage of these new technologies is the enormous increase in sequence output with lower costs and technical efforts. The sequencing of Y‐chromosomal BAC or cosmid clones will be especially facilitated by high‐throughput NGS. Moreover, even the whole Y could be decoded by de novo sequencing of a pool of hundreds of flow‐sorted Y chromosomes, although it might not be possible to align the nucleotide reads to a single contig sequence given the highly repetitive structure of the Y chromosome (Skaletsky et al. 2003b). However, for the purpose of identifying microsatellite motifs this would be irrelevant provided that the selected NGS platform has the sufficient read length. Despite the high potential of these strategies for generating large amounts of Y sequence data, one main challenge remains. Although the obtained sequences are Y‐chromosome derived, male‐specificity and single‐ copy status of all sequences has to be verified before they are useful. This is labour‐intensive as for example all microsatellite repeat motifs have to be tested for these criteria individually. Considering the architecture of the Y chromosome, the proportion of sequences fulfilling these criteria could be rather small. Nevertheless, these strategies represent a promising alternative to obtain Y‐chromosomal data especially in species lacking any Y‐sequence information so far.
2.5 Conclusions
Genetically tracing paternal lineages is hindered in most non‐model species by the lack of Y chromosome markers despite the employment of a wide range of different methodological strategies. In the near future, the quest for Y‐linked markers will benefit from a combination of recent technical advances such as NGS with current methods. For instance, longer Y‐ specific DNA sequence data by NGS can be obtained from long‐range PCR products of intronic MSY genes, directional Y chromosome walking or from BAC libraries. Moreover, the amount of exonic and intronic Y sequence information as well as our knowledge of the Y chromosome architecture of different mammals will increase in the near future considering the growing number of genomes being sequenced.
Despite the promising potential of the presented current and emerging methodological strategies, there is likely no straightforward solution in obtaining Y‐linked genetic markers. The distinct architecture of the Y chromosome with its highly palindromic structure, the widespread sequence homologies to the X chromosome and the general low levels of genetic variation observed will hamper the discovery of genetic markers fulfilling our criteria for Y‐
Chapter 2
52
linked loci. Furthermore, the evolutionary history of the study species will further affect the observed level of Y chromosome variation. However, although discovering Y‐linked genetic markers is difficult, the efforts are worthwhile considering their apparent potential to explore sex‐biased dispersal patterns and independent demographic population histories of males and females in wild animal populations.
In comparative mythology (Campell 1949), a quest describes a heroes’ journey in which “A hero ventures forth from the world of common day into a region of supernatural wonder: fabulous forces are there encountered and a decisive victory is won: the hero comes back from this mysterious adventure with the power to bestow boons on his fellow man.” We advise researchers with limited resources embarking on such a quest for the Y to form collaborations with laboratories in which the techniques presented in this review are well established. Additionally, the mating system and life history of the species in question also needs to be carefully considered upon embarkation. Following these two rules should enable researchers to bestow a wealth of suitable Y‐linked markers on their fellow scientists.
Acknowledgements
The authors are grateful to Glauco Camenisch (Zoological Museum, Zurich), Angelika Schwarze and Beat Steinmann (Children’s Hospital, Zurich), Patricia O’Brien and Malcom Ferguson‐Smith (Veterinary School, Cambridge) for their support. We appreciated the valuable comments made on the manuscript by Briana Gross, Anna Lindholm, and four anonymous reviewers. This work was supported by a Basler Stiftung für Biologische Forschung grant to PW and A.H. Schultz Foundation grants to MK and MG.