The quest for the Y
Genetic data from the male‐specific region of the Y chromosome (MSY) represents an essential complement to maternally and biparentally inherited genetic markers, and is critical for studying male‐specific evolutionary processes (Prugnolle & de Meeus 2002; Handley & Perrin 2007a). Because sex‐biased dispersal may strongly impact the genetic makeup of natural populations, a comprehensive understanding of a species’ evolutionary history necessitates the inclusion of sex‐specific genetic markers, i.e. mitochondrial and Y‐ chromosomal loci in mammals. Mitochondrial markers are easily accessible and have been applied successfully in population genetics since decades. The development of useful MSY‐ specific single‐copied markers, however, is technically challenging due to the highly complex architecture of the Y chromosome. Thus, male‐specific genetic data have remained elusive for most mammalian species.
In Chapter 2, published as an invited technical review in Molecular Ecology Resources
(Greminger et al. 2010), I present an overview of the current methodological strategies applied to developing MSY‐specific genetic markers in non‐model species and their practical feasibility and limitations. Furthermore, I describe strategies with future prospects with regard to the advent of high‐throughput sequencing.
In Chapter 5, submitted to Systematic Biology (Greminger et al., submitted), I present a novel
bioinformatics strategy to extract MSY‐specific single‐copy sequences from whole‐genome sequencing data. This approach allowed us for the first time to comparatively trace both the male‐ and female‐specific evolutionary history on a genomic level in a non‐human great ape (but see Xue et al. 2015). I also identified a large number of MSY‐specific microsatellite markers and single‐nucleotide polymorphisms (SNPs) which serve as a valuable resource for future studies of non‐invasively sampled wild orangutans. To my knowledge, comparable Y chromosome sequencing in non‐human mammals has only been achieved for horses (Wallner et al. 2013; Schubert et al. 2014), Mountain gorillas (Xue et al. 2015), as well as
polar and brown bears (Bidon et al. 2014). Our results of different evolutionary trajectories of males and females in orangutans demonstrate the great importance and power of genomic MSY‐specific data for the comprehensive understanding of a species' evolutionary history. I expect that the principle of my bioinformatics strategy will be widely applicable to other mammalian species. In fact, a highly similar strategy has very recently been applied to Mountain gorillas (Xue et al. 2015).
Reduced genome complexity sequencing
Despite advances in DNA sequencing technology, (re‐)sequencing whole genomes of many samples still constitutes substantial financial and computational effort, although it became more accessible at very recent times. Reduced genome complexity sequencing strategies (commonly known as RAD and RRL sequencing) offer great prospects for the generation of population genomic sequence data by allowing sampling only a fraction of the genome. In Chapter 3, published in BMC Genomics (Greminger et al. 2014), I developed a novel
protocol (named iRRL) for improved reduced genome complexity sequencing. Using this protocol, I generated iRRL data from the two populations at the extremes of the west–east gradient of variation of phenotypic traits in orangutans. The main strengths of my iRRL method are the very high genotyping‐by‐sequencing efficiency and reproducibility of genome complexity reduction among samples. My iRRL protocol is part of a growing suite of reduced complexity sequencing strategies that have transformed our ability to generate genomic data from natural populations.
Evaluation of SNP- and genotype calling
From a bioinformatical point of view, translating raw high‐throughput sequencing data into high‐quality SNP and genotype calls is challenging and requires many computational steps (Li
et al. 2009; DePristo et al. 2011; Nielsen et al. 2011; Pabinger et al. 2013). Based on the iRRL
data generated from two orangutan populations, I directly compared three commonly used SNP and genotype callers (Chapter 3) and obtained substantially different SNP datasets
depending on the caller algorithm, sequencing depth and filtering criteria. These inconsistencies affected scans to detect selective sweeps (low overlap of identified putative sweeps) and will likely also exert undue influences on demographic inferences as implied by shifts in the allele‐frequency spectra. Since the beginning of my Ph.D. candidacy, major advancements have been made in the development of sophisticated probabilistic algorithms for SNP and genotype calling (Van der Auwera et al. 2013; Li 2014). Nevertheless, accurate und unbiased SNP and genotype calling still remains a challenge, in particular for low or medium coverage reduced genome complexity sequencing data of non‐model organisms. For this type of data, it is usually not yet possible to apply machine learning algorithms for variant quality score recalibrations (McKenna et al. 2010; DePristo et al. 2011) as commonly done for whole genome sequencing data.
Chapter 1
30
Whole-genome sequencing
Despite the proven usefulness of reduced genome complexity sequencing (e.g. Hohenlohe et
al. 2010; Stölting et al. 2013), data obtained in this manner face several limitations with
respect to certain biological questions, which necessitate the use of whole‐genome data. For instance, many modeling approaches to infer demographic history (e.g. Li & Durbin 2011; Harris & Nielsen 2013) require whole‐genome data. Moreover, scans to detect signals of natural selection greatly profit from increased power, specificity, and resolution if based on whole‐genome data. Only with complete genome information, we can make use of the full spectrum of statistical tests, as well as actually pinpoint the genes and functional SNPs involved in local adaptation. Thus, to pursue the main goals of this dissertation, we decided to put our emphasis on a large collaborative effort to sequence whole genomes of 17 wild‐ born orangutans with good population provenance to medium–high coverage (Chapter 4).
Samples subjected to whole‐genome sequencing were carefully selected in order to complement previous sequencing efforts (Locke et al. 2011; Prado‐Martinez et al. 2013), thereby achieving a complete representation of the entire extant geographic range of the genus Pongo. The inclusion of the 20 previously sequenced individuals without reported provenance (Locke et al. 2011; Prado‐Martinez et al. 2013) was made possible by our detailed knowledge of orangutan phylogeography and population structure based on classical genetic markers (Chapter 3; Arora et al. 2010; Nater et al. 2011; Nietlisbach et al. 2012; Nater
et al. 2013; Greminger et al. 2014; Nater et al. 2015), providing a hitherto unprecedented
opportunity to identify the natal population of individuals retrospectively. This unique dataset of orangutan whole‐genome sequencing data constituted the fundament of the analytical work carried out in the Chapters 4–6. Chapters 4 and 6 will be published together with additional analyses, for which we have extended our collaborative network, in a main integrative paper (Greminger, Nater et al., in prep). Chapter 5 has been submitted to Systematic Biology (Greminger et al., submitted). Demographic history and population structure
In Chapter 4, I investigated the demographic history of the genus Pongo and the geographic
structure of autosomal genetic diversity. I found that the speciation of Bornean and Sumatran orangutans has been a gradual process over several hundred thousand years, heavily influenced by recurrent climate changes in Sundaland. My findings also revealed that Bornean and Sumatran orangutans were affected differently by the Pleistocene climate oscillations. While climate changes had a major impact on the evolutionary history of Bornean orangutans, likely causing repeated bottlenecks and a long‐term population decline, Sumatran orangutans were much less affected and experienced a remarkably stable population history and structure throughout the Pleistocene. Only recently, they also faced a drastic population decline, likely caused by the Toba supereruption ~73 ka and prehistoric hunting by early hunter‐gatherers. The former adds to the highly controversial discussion about the consequences of the Toba supereruption by providing, to my knowledge, the first
direct evidence of a strong regional impact of the supereruption on a large mammal. The findings presented in this chapter also have important ramifications for orangutan conservation and taxonomy, in particular with respect to the Batang Toru population, the only extant Sumatran orangutans south of Lake Toba.
Sex-specific phylogeography
In Chapter 5, I focused on the sex‐specific evolutionary histories of orangutans. Analyzing
large‐scale MSY sequence data (outlined above) and complete mitochondrial genomes, I found that orangutan evolutionary history is not only a tale of two islands, but also one of two sexes. Males and females exhibited strikingly distinct population histories and phylogeographic patterns, owing to high levels of male‐biased dispersal and strict female philopatry in orangutans. The results from the mitochondrial genomes further confirmed previous findings of a common late Pleistocene rainforest refugium of Bornean orangutans (Arora et al. 2010; Nater et al. 2011) as well as an extremely deep split of Sumatran orangutans to the north and to the south of Lake Toba (Arora et al. 2010; Nater et al. 2011). The genomic MSY data also shed light into the long‐lasting debate when male‐mediated gene flow ceased between Borneo and Sumatra (Harrison et al. 2006; Kanthaswamy et al. 2006; Steiper 2006; Locke et al. 2011; Nater et al. 2011; Nater et al. 2015), by revealing that the two species likely have been reproductively isolated for considerably longer time than proposed previously. The results presented in this chapter further suggest that different evolutionary forces might act on the MSY in the two orangutan species, probably linked to extensive reproductive skew among Sumatran males.
Genomic signatures of local adaptation
In Chapter 6, I present the first whole‐genome scans for positive selection within the genus Pongo to study the genetic basis of local adaptations. Using a combination of approaches to
detect signatures of positive selection, including window‐based genome scans to identify putative hard sweeps, I identified strong candidate genes and functional SNPs potentially associated with the observed variation in phenotypic traits in orangutans (van Schaik et al. 2009b). In Bornean orangutans, I found for instance signals of potential adaptation pertaining to energy storage (i.e. adipose tissue) metabolism, in congruence with their greater ability to deposit large fat storages. I also identified several candidate genes and biological processes related to neurogenesis, which is in line with the smaller brain size of Bornean orangutans. In contrast, in Sumatran orangutans, I found for example signatures of potential adaptive evolution of genes related to learning, adult brain plasticity, and the oxytocin pathway. I hypothesize that selective changes in these genes may provide Sumatran orangutans with a framework allowing for extended behavioral plasticity, as mirrored in their larger and more complex cultural repertoire and their higher sociability. Overall, the results of this chapter suggest that both orangutan species experienced very different adaptive evolutionary histories and that at least some of the striking geographic variation in orangutan phenotypic
Chapter 1
32
traits (van Schaik et al. 2009b; Wich et al. 2009b) may indeed represent genetic local adaptations.