IV. DESARROLLO DEL SUB-TEMA
4.5 Riesgo Crediticio Bancario
4.6.11 APLICACIÓN
Data available from next-generation sequencing experiments are particularly suited to statistical analysis and clonal dissection given that they represent a random sample of DNA molecules, and by extension cancer cell genomes, within a given tumour cell population. As such, the advent of next-generation sequencing has seen a surge in computational tools to explore the clonal architecture of tumours.
The fraction of reads reporting a point mutation is dependent upon the copy number at that locus, the level of tumour purity (or normal cell contamination) and finally, the cancer cell fraction, describing the fraction of cancer cells that harbour the mutation (Figure 1:2). The majority of tools to dissect clonal architecture rely on the relationship between these variables (see equation 1), to estimate whether mutations are likely clonal or subclonal.
VAF = p*CCF / (CPNnorm*(1-p) + p*CPNmut) (equation 1)
Where VAF=variant allele frequency; p= tumour purity; CCF = cancer cell fraction; CPNnorm = local copy number in normal genome; CPNmut = local copy number in tumour genome.
A first step in dissecting the clonal architecture of a tumour involves estimating its genomic copy number profile and also its purity. Both of these variables can be obtained from methods to estimate copy number across the genome, such as ASCAT (Van Loo et al., 2010), ABSOLUTE (Carter et al., 2012), OncoSNP (Yau et al., 2010), PICNIC (Greenman et al., 2010) or Sequenza (Favero et al., 2015a). These methods utilize mathematical frameworks to model the observed copy number array data as an amalgamation of measurements from a population of different cell types present at different proportions: tumours cells that contain an unknown amount of DNA, as well as an unknown proportion of normal cells, which have a known amount of DNA per cell. While the system of equations is undetermined, only a few combinations of purity and ploidy result in biologically meaningful solutions. For instance, ploidy cannot be negative or infinitely large and, in the absence of subclonal events, copy numbers must be positive integers.
Figure 1:2 Estimating cancer cell fraction
The cancer cell fraction of each mutation reflects both the local copy number and the variant allele frequency.
Once the local copy number and purity of a sample has been determined, the cancer cell fraction and mutational multiplicity, describing the number of chromosome copies a mutation has, can be estimated from equation 1. A simple approach to assess whether a given mutation is subclonal is to assume the mutation reflects a binomial distribution and to calculate the probability that the observed variant allele frequency differs from what would be expected given a clonal mutation (Carter et al., 2012, Stephens et al., 2012).
More sophisticated methods utilize the fact that multiple mutations with similar variant allele frequencies may correspond to a clonal or subclonal cluster of mutations. For example, PyClone (Shah et al., 2012, Roth et al., 2014) pioneered the integration of variant allele frequencies with allele specific copy number and purity estimates in order to define the subclonal composition of individual biopsies. The method uses a Bayesian Dirichlet clustering process to jointly group deeply sequenced (>100x) mutations, and infers posterior density estimates over the cancer cell fraction (CCF) for each mutation. Indeed, modelling the number of subclones as coming from a Dirichlet process does not require knowledge of the number of subclones a priori, thus allowing both mutations to be assigned to clusters and the number of clusters to be inferred as part of the model (Roth et al., 2014). A limitation of PyClone, however, is that it assumes all copy number events are clonal, and deep sequencing is required (Roth et al., 2014). The method adopted by Nik-Zainal (2012b) leverages data from whole genome sequencing to
circumvent the need for deep sequencing and allows mutations to reside on subclonal copy numbers. By contrast, SciClone (Miller et al., 2014) - which also applies a Bayesian clustering method - focuses exclusively on single-nucleotide variants (SNVs) in copy-number neutral, loss of heterozygosity (LOH)-free portions of the genome. Although this feature of SciClone circumvents issues associated with clonal and subclonal copy number aberrations, it means not every mutation can be associated with an SNV cluster. All these methods have also been extended to allow multiple samples over space or time to be included in subclonal clustering and this may considerably improve the accuracy of subclonal reconstruction.
Importantly, in the absence of multi-region data, it may be impossible to accurately de-convolve the subclonal structure in a tumour. For example, if two clonal populations have similar cancer cell fractions in one tumour region, they will appear as one clone. Analysis of another tumour region may enable their separation. Alternatively, further information regarding the clonal composition of tumours can be inferred based on the mutual exclusivity or co-occurrence of mutations in cancer cells, either from single cell sequencing or from ‘phasing’. Phasing involves determining whether mutations co-occur or are mutually exclusive, allowing different subclones to be delineated; if two mutations never occur together they are likely to represent distinct subclones, whereas if two mutations can be phased in the same cancer cell they are necessarily present in the same lineage (Nik-Zainal et al., 2012b). However, such approaches are currently limited to analysis of mutations in regions of hyper-mutations or high mutation burden.
The majority of computational tools focus on dissecting the heterogeneity of SNVs. More recently, tools have also been developed to explore heterogeneity at the copy number level (Ha et al., 2014, Oesper et al., 2014). For example, THetA (Tumour Heterogeneity Analysis) is an algorithm that seeks to estimate tumour, purity and clonal and subclonal copy numbers directly from DNA sequencing data (Oesper et al., 2014), while TITAN uses a hidden Markov model (HMM) framework to estimate clonal and subclonal clusters of copy numbers (Ha et al., 2014).