DEVELOPING DRUG PRIORITIZATION APPROACHES TO TARGET CANCER GENOMES FOR PRECISION MEDICINE
Elena Piñeiro Yáñez
Madrid, 2020
DEPARTMENT OF BIOCHEMISTRY FACULTY OF MEDICINE
UNIVERSIDAD AUTÓNOMA DE MADRID
DEVELOPING DRUG PRIORITIZATION APPROACHES TO TARGET CANCER GENOMES FOR PRECISION MEDICINE
Doctoral Thesis
Elena Piñeiro Yáñez Graduate in Biology
Thesis Director: Dr. Fátima Al-Shahrour Núñez
Centro Nacional de Investigaciones Oncológicas (CNIO)
AGRADECIMIENTOS
Quiero empezar agradeciendo a Fátima el haberme abierto las puertas de la investigación y la ciencia, así como las numerosas oportunidades que me ha dado para desarrollarme y crecer profesional y personalmente. Gracias por el apoyo, la paciencia, la generosidad, el ejemplo y los conocimientos transmitidos todos estos años. Es mucho lo que me has enseñado y lo que tengo que agradecerte.
Gracias también a los compañeros que tan bien me acogieron cuando llegué al CNIO, por la cordialidad y el buen ambiente. Quiero hacer una mención especial para Miriam y Jose María, por el privilegio de permitirme conservar su valiosa amistad. Y también para dos supervivientes de aquellos tiempos: Osvaldo y Gonzalo (quien además me ha ayudado enormemente con sus contribuciones y sugerencias en la tesis). Gracias a ambos por el apoyo, consejos y todo lo que aprendo de vosotros.
Gracias a los compañeros de travesía en la TBU: a Javier por ser un compañero ejemplar y por todas las experiencias enriquecedoras compartidas, a Héctor por todas sus buenas ideas y conocimientos aportados y a Kevin por su ayuda y gran sentido del humor.
Gracias a los demás compañeros de la gran familia de la Unidad de Bioinformática, integrantes y allegados. A los que se fueron. A los que ahora están: Tomás, Gabriel, Coral, Carlos, Santi, Piti (que además ha contribuido valiosamente a este trabajo), Luis, Daniel, Alejandro, Jaime, Fernando, Laura, Michael y Thomas (a quien también agradezco la gran ayuda prestada con el inglés de la tesis). Gracias a todos por la frescura aportada al día a día, por el compañerismo, por vuestra sabiduría y soporte.
Gracias a los coautores de las colaboraciones que han hecho este trabajo posible, por la oportunidad de participar en ellas y trabajar con vosotros. Enormes gracias a Daniel y Miguel, de la Universidad de Vigo, por su gran contribución, profesionalidad y cercanía.
Gracias a los amigos presentes de múltiples formas en este capítulo de mi historia. Gracias por las aventuras, las experiencias, las risas, que han sido un oasis en estos años. Gracias a Miriam, Silvia y Virginia por estar siempre presentes en lo bueno y en lo malo. A Diana por enseñarme que los amigos son la familia que uno elige. A Suevia por seguir pendiente de cómo estoy. A Paula por las charlas divertidas que me han animado. A Ilda por la positividad y alegría que tantas veces me han inspirado y contagiado.
Y finalmente, gracias a mi familia por su cariño y apoyo. En especial a mis abuelos Enrique y Aquilina por su mecenazgo y los valores sembrados que siguen dando frutos. Y a mi padre, mi madre y mi hermana, por ese amor incondicional capaz de trascender tiempo, espacio y limitaciones, que me da las fuerzas para seguir siempre adelante.
Gracias a todos.
TABLE OF CONTENTS
ABSTRACT/RESUMEN 5
ABBREVIATIONS 9
INTRODUCTION 11
1. CANCER: A COMPLEX DISEASE 13
1.1. Molecular alterations in cancer 13
1.2. Taxonomy of cancer genomic variants 14
1.2.1. Somatic and germline variants 14
1.2.2. Oncogenes and tumor suppressor genes 15
1.2.3. Drivers and passengers 16
1.3. Tumor evolution and heterogeneity 16
2. PRECISION ONCOLOGY 17
3. GENOMIC PROFILING IN CLINICAL PRACTICE 19
3.1. Genomic variation and its biological implications 19
3.1.1. Variant sizes 19
3.1.2. Variants in the population 20
3.1.3. Biological consequences of variants 20
3.2. Clinical relevance of variants 22
4. CANCER TREATMENT 24
4.1. Type of drug-based therapies 24
4.2. Targeted therapies for precision oncology 25 4.3. Biomarkers of drug response guiding therapeutic prescription 26 4.4. Limitations in current therapy proposal based on biomarkers 27
5. CANCER PHARMACOGENOMICS 28
6. BIOINFORMATICS TOOLS IN PRECISION ONCOLOGY 29
6.1. Biomedical knowledge bases 30
6.2. Prioritization of alterations 31
6.3. In silico drug prescription 32
OBJECTIVES 33
MATERIALS AND METHODS AND RESULTS 37
ARTICLE 1 39
ARTICLE 2 53
ARTICLE 3 71
ARTICLE 4 83
1
DISCUSSION 95
1. DRUG-GENE ASSOCIATIONS DATABASE 97
2. PRIORITIZATION SYSTEM FOR ALTERATIONS AND THERAPIES 99
3. THERAPY SELECTION 100
3.1. Expansion of therapeutic options 100
3.2. Prescription accuracy 101
4. COMPARISON WITH OTHER RESOURCES 102
5. PANDRUGS WEB AS AN INTEGRATIVE TOOL 103
6. THERAPEUTICAL ACTIONABILITY SPECTRUM 104
7. DRUG PRESCRIPTION IN CASE STUDIES 105
8. PROOF OF CONCEPT WITH CLINICAL DATA 107 9. LIMITATIONS OF IN SILICO PRESCRIPTION APPROACHES AND FUTURE
DIRECTIONS 109
9.1. Multiple layers of information 110
9.2. Tumor heterogeneity 110
9.3. Germline mutations and side effects 110
9.4. Concurrent medical conditions 111
CONCLUSIONS/CONCLUSIONES 113
REFERENCES 119
ANNEX: Scientific Production 139
2
LIST OF TABLES
Table 1: Biomarkers for drug indication in Foods and Drug Administration (FDA)
labeling 26
LIST OF FIGURES
Figure 1: Precision medicine workflow for treatment selection 18 Figure 2: Stages for genomic profiling intervention 23 Figure 3: Stages and components in the discovery and application of
pharmacogenomic knowledge 30
Figure 4: Improvements planned to be integrated in PanDrugs methodology 111
3
ABSTRACT
Precision oncology requires the definition of a distinctive molecular fingerprint of the patient, as well as a pharmacological arsenal with the agents capable of reversing the associated pathogenic phenotype. One of the tools for molecular characterization is massive genomic sequencing, used with different objectives, including to provide evidence to identify specific and effective treatments. However, this technology entails several difficulties in its management, for example, distinguishing between harmful and benign alterations, as well as pointing out those that may indicate treatment guidelines. On the other hand, the available pharmacological arsenal is limited, so strategies are required to maximize its utility and the integration of available experimental compounds.
The methodology of in silico prescription developed in this thesis, which we have named PanDrugs, has been conceived to contribute to overcoming these difficulties.
For this purpose, we have built an extensive database of drug-gene associations as a search basis and a double prioritization system: (i) of genomic events according to their oncological impact and consequent therapeutic potential, (ii) and of drugs according to their availability and suitability in the detected molecular context.
In order to determine its capabilities, it has been analyzed in several scenarios using different combinations of molecular evidence. In order to characterize in general terms the potential spectrum of therapeutic action in the different types of tumors, it has been systematically applied to several cases of the TCGA (The Cancer Genome Atlas). It has also been used in individual patients at a higher resolution, dissecting the proposed therapeutic suggestions at a molecular level, where it has been able to prioritize coherent options both at the level of sensitivity and resistance. It has demonstrated its potential for proposing therapeutic alternatives to conventional treatments in a clinical study of acute lymphoblastic T-cell leukemia with previous therapeutic failure. And finally, it has shown predictive capacity in the response to EGFR inhibitors in cases with available clinical information.
5
RESUMEN
La oncología de precisión requiere de la definición de una huella molecular distintiva del paciente, así como de un arsenal farmacológico con los agentes capaces de revertir el fenotipo patogénico asociado. Una de las herramientas de caracterización molecular es la secuenciación genómica masiva, usada con distintos objetivos, entre ellos, aportar evidencias para identificar tratamientos específicos y efectivos. Sin embargo, esta tecnología conlleva varias dificultades en su gestión, por ejemplo, la de distinguir entre alteraciones dañinas y benignas, así como señalar las que puedan indicar pautas de tratamiento. Por otro lado, el arsenal farmacológico disponible es limitado, por lo que se requieren estrategias para maximizar su uso y la integración de compuestos experimentales disponibles.
La metodología de prescripción in silico desarrollada en esta tesis, y a la que hemos nombrado PanDrugs, ha sido concebida para contribuir a la superación de estas dificultades. Para ello se ha construido una amplia base de datos de asociaciones fármaco-gen como base de búsqueda y un doble sistema de priorización: (i) de eventos genómicos según su repercusión oncológica y consecuente potencial terapéutico, (ii) y de fármacos según su disponibilidad e idoneidad en el contexto molecular detectado.
Con el objetivo de determinar sus capacidades, PanDrugs se ha analizado en diversos escenarios utilizando distintas combinaciones de evidencias moleculares.
Para caracterizar a grandes rasgos el espectro potencial de accionabilidad terapéutica en los distintos tipos de tumor, se ha aplicado sistemáticamente a diversos casos del TCGA (The Cancer Genome Atlas). También se ha usado en pacientes individuales a una mayor resolución, desgranando molecularmente las sugerencias terapéuticas propuestas, donde ha sido capaz de priorizar opciones coherentes tanto a nivel de sensibilidad como de resistencia. Ha demostrado su potencial para plantear opciones terapéuticas alternativas a las convencionales en un caso clínico de leucemia linfoblástica aguda de células T con fallo terapéutico previo. Y finalmente ha mostrado capacidad predictiva en la respuesta a inhibidores de EGFR en casos con información clínica disponible.
7
ABBREVIATIONS
API: Application Programming Interface CCLE: Cancer Cell Line Encyclopedia CGI: Cancer Genome Interpreter CKB: Clinical Knowledgebase CNV: Copy Number Variation
CTRP: Cancer Therapeutics Response Portal DGIdb: Drug Gene Interaction database DScore: PanDrugs’ Drug Score
EGFRi: Epidermal Growth Factor Receptor Inhibitors EHR: Electronic Health Record
ESMO: European Society for Medical Oncology FDA: Food and Drug Administration
GDC: Genomic Data Commons
GDSC: Genomics of Drug Sensitivity in Cancer gnomAD: Genome Aggregation Database GoF: Gain of Function
GScore: PanDrugs’ Gene Score
ICGC: International Cancer Genome Consortium Indel: small insertion or deletion
I-PREDICT: Investigation of Profile-Related Evidence Determining Individualized Cancer Therapy
LoF: Loss of Function
LUSC: Lung Squamous Cell Carcinoma
NCCN: National Comprehensive Cancer Network NCI: National Cancer Institute
NCI-MATCH: National Cancer Institute-Molecular Analysis for Therapy Choice NSCLC: Non Small Cell Lung Cancer
ONC: Oncogene
9
OncoKB: Precision Oncology Knowledge Base OS: Overall Survival
PCAWG: Pan-Cancer Analysis of Whole Genomes Consortium PDX: Patient Derived Xenograft
PFI: Progression-Free Interval PFS: Progression-Free Survival
PMKB: Precision Medicine Knowledgebase SNP: Single Nucleotide Polymorphism SNV: Single Nucleotide Variant
T-ALL: T-cell acute lymphoblastic leukemia
TARGET database: Tumor Alterations Relevant for GEnomics-driven Therapy TARGET trial: Treatments Against RA and Effect on FDG PET-CT
TCGA: The Cancer Genome Atlas TMB: Tumor Mutational Burden TSG: Tumor Suppressor Gene VCF: Variant Call Format
WES: Whole Exome Sequencing WGS: Whole Genome Sequencing
10
INTRODUCTION
11
Introduction
1. CANCER: A COMPLEX DISEASE
Cancer is a complex disease triggered by the convergence of several factors, such as genomic and epigenomic alterations, environment and lifestyle. It can affect different tissues and organs where abnormal cells proliferate uncontrollably and generate a cellular mass called a tumor that compromises their proper functioning and can spread out to adjacent tissues. Eventually, these tumoral cells can enter the circulatory system and reach more distant areas of the body, where, if the conditions are favorable, they can set up a new tumoral colony producing so-called metastases (Fares et al. 2020). The ubiquity of this pathology leads to a conventional classification according to the cellular type affected and the tissue or organ of origin.
There are more than 100 types of cancer described in the National Cancer Institute (NCI) (Anon 1980) according to this criterion, presenting different prevalence in the population, and constituting one of the leading causes of death globally.
1.1. Molecular alterations in cancer
The biological processes involved in carcinogenesis are categorized into the so-called hallmarks of cancer (Hanahan & Weinberg 2000; Hanahan & Weinberg 2011). Different types of molecular alterations are responsible for the acquisition of these hallmarks, such as genomic and epigenomic changes where preferences for different events have been revealed in different types of tumors (Ciriello et al. 2013).
Understanding of the disease has been boosted by the emergence of large consortia for cancer research. Initiatives such as The Cancer Genome Atlas (TCGA) project (Cancer Genome Atlas Research Network et al. 2013) or the International Cancer Genome Consortium (ICGC) (International Cancer Genome Consortium et al. 2010), have accomplished the profiling of thousands of tumoral samples of different types in order to collect various strands of molecular evidence, which have revealed important findings about the molecular foundations of cancer, and highlighted its extreme complexity and heterogeneity. The Pan-Cancer Analysis of Whole Genomes (PCAWG) was created to analyze more than 2600 whole genomes across 38 tumor types with the objective of expanding current knowledge and characterizing
13
Introduction
non-coding tumoral factors. The analysis and applications derived from these genomes are being presented in a series of publications (ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium 2020).
A remarkable aspect of these projects is that they have made available to the scientific community the molecular and clinical data obtained, thus providing a valuable resource for generating new hypotheses and enhancing various studies that boost even further the knowledge of the disease and the management of the cancer patients.
1.2. Taxonomy of cancer genomic variants
Alterations in DNA are one of the main contributors to cancer development and progression. In the context of cancer, they are categorized and referenced by specific molecular terminology, such as the concepts of oncogenes (ONCs) (Huebner & Todaro 1969) and tumor suppressor genes (TSGs) (Knudson 1983), the distinction between passengers and drivers (Vogelstein et al. 2013), and the classification of events in somatic and germline alterations.
1.2.1. Somatic and germline variants
Germline variants appear in the gametes from which a new individual is developed.
They are inheritable and present in all the cells of the organism. Somatic variants, on the contrary, appear in the somatic cells of the body. Therefore, this type of alteration is not inheritable and its presence is limited to the cell it originated in and the cells derived from it by cellular division.
The distinction between somatic and germline alterations in cancer is key to elucidating its etiology and possible treatment. Most of the analysis of cancers developed in the adult stage are focused in the search for somatic alterations, mainly with a therapeutic purpose, because they are the events that trigger the tumor and the ones that can be used to revert it. Nevertheless, evidence is steadily accumulating that emphasizes the need to take germline information into account in the interpretation of tumoral data. Although most of the germline variants are clinically benign, some are associated with pathologies such as hereditary cancer
14
Introduction
(Pilié et al. 2017; Fewings et al. 2018), cancer predisposition syndromes (Garber &
Offit 2005; Zhang et al. 2015) or drug response (Wang et al. 2011; Menden et al.
2018).
1.2.2. Oncogenes and tumor suppressor genes
Oncogenes and tumor suppressor genes are driver genes that exert their oncogenic effect through their dysregulation. In normal conditions, ONCs are in a proto-oncogene state and their gain of function (GoF) boosts a chain reaction that culminates with the acquisition of hallmarks of cancer. The first ONCs identified in human cancer are those coding for the members of the RAS protein family (KRAS, HRAS or NRAS) that are involved in signaling pathways controlling proliferation, differentiation and cell survival (Malumbres & Barbacid 2003; Fernandez-Medarde &
Santos 2011). On the contrary, TSGs in normal conditions adjust signaling processes in order to maintain cellular homeostasis, and a loss in their function (LoF) leads to the oncogenic process. The classical example is TP53, which is responsible for the repression of cell proliferation under different circumstances, and is frequently inactivated in many tumor types (Harris & Hollstein 1993).
Although many genes act commonly as ONCs or TSGs, their behavior may depend on the context, and therefore, rather than an inherent quality, it is a role a gene acquires under certain circumstances (Shen et al. 2018).
Both types of role present different mutational patterns. ONCs are mostly affected by GoF variations, such as amplifications or activating missense alterations in functional regions of the protein. The sites in the sequence that involve an activation are limited, and for that reason, enabling small-scale variants usually appear with a high frequency in those specific locations. On the other hand, TSGs are deregulated by LoF variations, such as large-scale deletions, frameshift alterations, nonsense variants, splice-site variants or missense inactivation. As the LoF can be achieved by truncating the molecular structure in multiple places, small-scale alterations are spread throughout the sequence of TSGs, usually, with a lower frequency in each particular position (Vogelstein et al. 2013).
15
Introduction
1.2.3. Drivers and passengers
Passengers, with a neutral effect, constitute the majority of the alterations acquired by cells during tumor evolution. On the other hand, those alterations that allow the acquisition of a hallmark and the consequent tumor are named drivers (Stratton et al.
2009). The ratio between drivers and passengers mutations is variable and some types of tumors present a larger number of passengers. In general, it is estimated that more than 90% of the mutations that appear in cancer are passengers (McFarland et al. 2017).
Those genes affected by driver alterations are known as driver genes. Many efforts have been made to characterize them. One of the first censuses of driver genes was created by Vogelstein and collaborators (Vogelstein et al. 2013) where 138 genes were identified. Subsequent studies have increased this number, thanks to the increasing availability of data and the development of novel methodologies (Gonzalez-Perez et al. 2013; Ding et al. 2014; Porta-Pardo et al. 2017). These lists of driver genes are valuable to identify the more frequent key elements in the tumoral process. Still, there are studies (Martincorena et al. 2017) that also show the limitations of the exclusive use of known drivers to identify the causal agents in many contexts and to use this as a basis for therapeutic strategies. In addition, the concept of driver just represents a potentiality of the gene and a more precise analysis of its alteration is required, since in some cases it may carry changes that are exclusively innocuous.
1.3. Tumor evolution and heterogeneity
Tumoral cells coexist in a cellular environment analogous to natural ecosystems, where different cellular populations called clones interact between each other and with their environment. The appearance of these clones is guided by a similar mechanism to that happening at the species scale, where the successive emergence of alterations creates a genetic diversity operated on by evolutionary pressure that is dominated in cancer by a positive selection (Martincorena et al. 2017). The co-occurrence and mutual exclusivity of certain alterations provides evidence for the
16
Introduction
existence of this selective pressure during tumoral evolution (Mina et al. 2017). Each tumor experiences a particular evolutionary process that leads to the characteristic tumoral heterogeneity, largely explained by convergent mechanisms that lead to the acquisition of hallmarks (Cancer Genome Atlas Research Network et al. 2013) and that is shaped by other factors such as microenvironment pressure and treatments.
Tumoral heterogeneity is manifested at several levels (Grzywa et al. 2017) and is shaped by spatial and temporal factors:
- Intratumoral heterogeneity appears between different cells inside the tumoral mass.
- Intertumoral heterogeneity is due to variability among different individual tumors or metastases inside the same patient.
- Interpatient heterogeneity involves differences between tumors of different patients, even with the same tissue or organ of origin (Ciriello et al. 2013).
Due to tumoral heterogeneity, only a small proportion of tumors have an identical profile between cases (Bieg-Bourne et al. 2017). This is manifested in the uneven distribution of the frequencies of somatic events which presents a long right tail with recurrent but not frequent alterations, lacking biological and clinical validation, but with potential for therapeutic action (Yuan et al. 2014; Chang et al. 2016).
2. PRECISION ONCOLOGY
Precision medicine is a procedure for the clinical management of patients - including the prevention, diagnosis, prognosis and treatment - guided by molecular characteristics, environmental conditions and lifestyle (Hodson 2016), in contrast with the classical approach of “one-size-fits-all” where the clinical management is defined for an average patient. The objective of precision medicine is to improve the clinical response and the quality of life of the patients through their stratification according to the characteristics of the individual, associating each stratum with specific guidelines that are more effective in that particular context (Ashley 2016).
Precision medicine is particularly useful in pathologies characterized by a high complexity and heterogeneity like cancer, where the evolution and response to
17
Introduction
treatments is highly variable among individuals. The application of personalized medicine in the oncological field constitutes the so-called precision oncology.
Prognosis and selection of therapies in cancer has been traditionally guided by the original tumor tissue, but the consideration of molecular features improves outcome predictions, increases precision of tumor classification, and should be accounted for when selecting the therapeutic regimen (Hoadley et al. 2014; Yuan et al. 2014). In fact, more personalized approaches are being applied progressively worldwide. But in spite of these advances and with some exceptions, the use of molecular characterization to guide therapies is currently done only as an alternative approach after the application of conventional treatments in the medical protocols have failed.
The paradigm of precision oncology based on molecular data for treatment selection follows a set of steps represented in Figure 1 and is being clinically tested to see if it offers an improvement over the classical approach. NCI-MATCH, SHIVA, TARGET, I-PREDICT or WINTHER trials are examples of these studies, where variable degrees of success have been achieved. In general, the number of patients included and benefited by a precision medicine approach is small, but portrays a promising message about the feasibility and potential of these strategies (Tourneau et al.
2019).
Figure 1. Precision medicine workflow for treatment selection. Patients' tumors undergo molecular profiling to extract signatures that allow patient stratification using computational resources. Then candidate therapies are selected based on this information, and ideally tested in functional assays prior to treatment administration.
18
Introduction
3. GENOMIC PROFILING IN CLINICAL PRACTICE
The arrival of massive sequencing in 2005 stirred up the experimental ground in biology and clinical practice. Its progressive cost reductions together with a cumulative improvement in the technology, make it widely applicable in the detection of alterations in tumoral samples. Mutational profiles are the most common data employed in clinical practice, as much for tumoral classification as for the treatment definition (Li et al. 2017; Menden et al. 2018). The search for alterations is done mostly through targeted DNA sequencing, where panels are the preferred method owing to their low cost and high depth of coverage, as well as easier postprocessing and interpretation (Kamps et al. 2017).
3.1. Genomic variation and its biological implications
The first draft of the human genome was obtained for the first time in 2001 (Consortium & International Human Genome Sequencing Consortium 2001). The subsequent emergence of massive sequencing techniques made it possible to precisely determine the sequence and to create a human reference genome, a key tool to identify the existing genomic differences in individuals. There are many mechanisms responsible for the occurrence of genomic differences, such as recombination, or the acquisition of new mutations during the life of the organism.
These genomic differences establish the foundation for evolution to work and are responsible for many phenotypes and diseases like cancer.
Variants in DNA can be named and classified according to different criteria, such as their extent, biological consequence, or their frequency in the population.
3.1.1. Variant sizes
DNA variants have a wide range of sizes, affecting from just one nucleotide to an entire chromosome. Small-scale alterations affect small regions of the sequence and are the most common human variants (more than 99.9%) (Consortium & The 1000 Genomes Project Consortium 2015). They include substitutions - the Single Nucleotide Variant (SNV) being the simplest case -, and small insertions and
19
Introduction
deletions - commonly abbreviated as indels - where size typically ranges from 1-10,000 base pairs (Mills et al. 2006).
Alterations at a greater scale are structural variations and aneuploidies. The form of structural variation that involves changes in the number of copies of DNA regions is called Copy Number Variation (CNV). It can be a deletion or an amplification depending on whether it is a loss or an acquisition of copies of a region, respectively.
Rearrangements are another category of structural variation that cause a change in the DNA structure. They include translocations and inversions. Aneuploidies remove or add an entire chromosome, leading to a different ploidy from that which is characteristic of the species.
3.1.2. Variants in the population
From the point of view of population genetics, when a variant is present in the population due to hereditary factors it becomes a polymorphism. For the SNV, the commonly used term is Single Nucleotide Polymorphism (SNP) and it is the most common type of polymorphism - one per thousand of the base pairs of the human genome. The frequency threshold that differentiates polymorphisms and rare variants is arbitrary and traditionally it is required that a variant be present in at least 1% of the population for a polymorphism to be called (Cavalli-Sforza & Bodmer 1999; Schildgen & Schildgen 2013). Nonetheless, this threshold is progressively weakened with the increasing amount of data and multipopulation analysis (Karki et al. 2015). With the improvement in the efficiency and cost of the sequencing technology, several projects emerged with the aim of characterizing common variants in worldwide populations, starting with the 1000 Genomes Project (Consortium & The 1000 Genomes Project Consortium 2015) and continuing with the recent gnomAD (Karczewski et al. 2020) which explores variants in many thousands of exomes and genomes in several major populations.
3.1.3. Biological consequences of variants
The biological impact of variants depends on the affected genomic region, the transcriptional and translational change, and their functional repercussions. Some variants are well characterized and their biological impact is known. However, the
20
Introduction
functional consequences and the biological implications remain unknown for many of them.
The genomic location and the sequence change give clues about the possible impact of variants. Regulatory, intergenic or intronic regions represent around 98%
of the total sequence and constitute an important fraction of alterations that can appear in the DNA (Weinhold et al. 2014). They can have important biological repercussions (Zhang & Lupski 2015; Araya et al. 2016; Gloss & Dinger 2018;
Rojano et al. 2018) but nevertheless, due to a great extent to the predominance of targeted sequencing and the scarcity of functional studies of non-coding alterations, the best known are those affecting the coding regions of the DNA, which are the ones usually kept during bioinformatic analysis (Kamps et al. 2017). SNVs located in coding regions can cause synonymous or missense amino acid changes, as well as gains and losses of start and stop codons. Although some synonymous variants can have a biological impact (Supek et al. 2014; Gotea et al. 2015) affecting in some cases the splicing and stability of RNA (Chamary et al. 2006), it is a common practice to consider non-synonymous SNVs as more damaging. Depending on the number of affected bases, indels are referred to as in-frame if they preserve the reading frame in the sequence, and frameshift if they shift it. Both SNVs and indels can alter the splicing regions, changing the way exons are combined and subsequently creating nonfunctional proteins. Regarding structural variation, CNVs can lead to a change in the number of copies of a gene, increasing, reducing or even suppressing their biological function. Translocations, for their part, can lead to the appearance of chimeric proteins where two different genes are joined together.
These gene fusions can also emerge as a consequence of a CNV that removes the space in between two genes.
On top of inferring the biological effect of a variant through the severity of the change in the sequence, there are various bioinformatic tools and methodologies that can be used to predict their functional impact. The predictions are mostly oriented to protein disruption and most of them focus on missense alterations using several criteria, such as the degree of evolutionary conservation of the affected amino acid (Shihab et al. 2013), the physicochemical properties and the location of the alternate amino
21
Introduction
acids in the protein sequence (Adzhubei et al. 2010; Vaser et al. 2016), or the frequency values in evolutionarily close species (Sundaram et al. 2018). Other methods predict the involvement of variants in splicing using different methodologies such as machine learning or deep learning (Mort et al. 2014; Jaganathan et al.
2019). Despite their utility, predictive methodologies have their caveats, such as a moderate specificity with a tendency to overpredict deleterious impacts (Li et al.
2017; Ernst et al. 2018), or a prediction capacity that varies between predictors and genomic regions, and therefore, their verdict must be taken cautiously.
With regard to polymorphisms, they can confer reduced or increased risks for different pathologies, including cancer (Whibley, Pharoah, and Hollstein 2009;
Cheng et al. 2015) or influence the drug response (Hattinger et al. 2016; Menden et al. 2018), but due to their high frequency in the in population, they are mostly considered clinically benign.
3.2. Clinical relevance of variants
Most patients have at least one detected alteration with the potential of being clinically relevant (Bieg-Bourne et al. 2017; Sanchez-Vega et al. 2018) at some point in the tumoral progression (Figure 2). Those are the variants that:
- Change the gene function. This is the case for the activating variants in the residue 12 of the KRAS protein (Prior et al. 2012) that confer oncogenic properties, or the alterations in the gene TP53 (Cole et al. 2017) that remove the tumor suppression capacity.
- Help to establish a diagnosis, such as the presence of the Philadelphia chromosome (oncogenic BCR-ABL fusion) that indicates a Chronic Myeloid Leukemia (Hsueh et al. 2013).
- Influence the prognosis. For instance, mutations in genes BRCA1 or BRCA2 increase the probability of a recurrent breast cancer (Nilsson et al. 2014) and alterations in TP53 are associated with lower survival in different types of leukemia (Stengel et al. 2017).
22
Introduction
- Guide the selection of therapies because they predict sensitivity, resistance or toxicity to a drug, or serve as a criterion for inclusion in a clinical trial. This is what happens with deletions in exon 19 or the mutation L858R in the gene EGFR that cause sensitivity to treatments with tyrosine kinase inhibitors such as erlotinib or gefitinib in patients with lung cancer (Rosell et al. 2009; Su et al. 2017); or the mutation T790M in the same gene, conferring resistance to those inhibitors (Yun et al. 2008).
- Suggest the use of surveillance measures for prevention or early detection.
For example, the alteration of BRCA1 and BRCA2 genes is a risk factor for breast and ovarian cancer (Kuchenbaecker et al. 2017). The same happens with mutations in APC in colon cancer (Stoffel et al. 2015).
Figure 2. Stages for genomic profiling intervention. Genomic alterations accumulate throughout life and can generate clonal expansions that drive tumor development according to different fitness pressures. Their interrogation in the different stages can be used in clinical management based on their clinical involvement. Figure adapted from (Nangalia &
Campbell 2019).
23
Introduction
4. CANCER TREATMENT
Cancer treatment follows initial rules fixed by guidelines created by health organizations such as the National Comprehensive Cancer Network (NCCN) in the United States or The European Society for Medical Oncology (ESMO) in Europe.
These guidelines are specific for each tumor type, based mostly in the histology, type, grade or stage of the tumor. The most effective treatments are surgery and radiation therapy, but drug therapies are also available for cases where these strategies have not been sufficient to manage the tumor.
4.1. Type of drug-based therapies
There are different types of therapies using drugs, where the choice of therapy depends on the type of cancer and its stage, and frequently involves a combination of compounds. They are:
- Chemotherapy with cytotoxic agents administered alone, or more frequently in combination with other treatments. They have the disadvantage of killing both tumoral and healthy cells, leading to unpleasant side effects. Examples of chemotherapeutic agents are the platinum compounds (cisplatin, carboplatin or oxaliplatin) used in many solid tumors to disrupt DNA replication to cause cell death (Falzone et al. 2018).
- Photodynamic therapy that uses compounds that sensitize tumoral cells to lights of specific wavelength. Porfimer sodium or talaporfin are photosensitizing drugs activated by laser light used in the treatment of esophageal cancer (Inoue & Ishihara 2020).
- Hormonal therapy that interferes with the activity of hormones and that is employed in cancers that depend on them for growth. This is true of the aromatase inhibitors letrozole and anastrozole used in postmenopausal women with estrogen-receptor-positive breast cancers (Smith & Dowsett 2003).
- Immunotherapy that stimulates the immune system to eradicate tumoral cells. It includes immune checkpoint inhibitors such as the monoclonal
24
Introduction
antibodies ipilimumab, nivolumab and pembrolizumab used to treat metastatic melanomas and other types of tumors (Falzone et al. 2018). Other immunotherapy strategies are adoptive therapy, in which immune cells activated against tumors are selected and re-infused in the patient, monoclonal antibodies that make cancer cells more visible to the immune system, and therapeutic vaccines.
- Targeted therapy that uses compounds directed to oncogenic drivers or genetic vulnerabilities (Senft et al. 2017), further explained in the next section.
4.2. Targeted therapies for precision oncology
Because of its specificity, targeted therapy constitutes the foundation of precision oncology. Agents used in this therapy are small molecules or monoclonal antibodies that act differently than in immunotherapy. In this case, monoclonal antibodies can directly kill the cells, mark them to be destroyed by the immune system or transport toxins inside the tumoral cells. The therapeutic action of targeted therapy is achieved through the direct binding of the drug to the altered biomolecules (Santos et al.
2017). Most targeted drugs act by inhibiting the product of activated oncogenes such as EGFR inhibitors (EGFRi) that counteract the EGFR activated or over-expressed cases. There are also approaches that try to restore the activity of mutated tumor suppressor genes, which has proven to be a difficult task. Alternatives for the remaining untargetable driver alterations would be a downstream action or the search for partners of synthetic lethality that can be targeted (Morris & Chan 2015).
In synthetic lethal interactions, the inactivation of one element keeps the cellular phenotype viable, whereas collective inactivation kills the cell. The detection of this kind of pair further increases the specificity because it is only effective in those tumoral cells where the vulnerability is present.
Despite the advantage of specificity, targeted therapy has some limitations such as the associated resistances, so they usually work better in combination with other types of treatment such as chemotherapy or radiotherapy. Also, its development is difficult and the number of targeted drugs is still limited in comparison with all the potential targets.
25
Introduction
4.3. Biomarkers of drug response guiding therapeutic prescription
Precision medicine relies on biomarkers that are biological indicators for risk assessment, diagnosis, prognosis or drug response prediction. Currently, these biomarkers are mostly molecular entities whose discovery has been accelerated due to high-throughput technologies. Some of them have been adopted in clinical practice, such as alterations in ERBB2 gene in breast cancer, which indicates a worse prognosis and a guided treatment with lapatinib or trastuzumab.
The biomarkers for drug response can be the targets of the drug or just entities not targeted by the drug whose alterations are somehow involved in a sensitivity or resistance response to it. The amount of approved treatments guided by biomarkers is still limited (Kurnit et al. 2018) although it has been progressively increasing until the current figure of 64 according to FDA pharmacogenomics labeling (Table 1) (Center for Drug Evaluation & Research 2020).
Table 1. Biomarkers for drug indication in Foods and Drug Administration (FDA) labeling.
26
Introduction
4.4. Limitations in current therapy proposal based on biomarkers
The knowledge of carcinogenic mechanisms is increasing in completeness and precision, but its clinical translation is still limited and the identification of the effective treatment for a particular patient is far behind the achieved progress at other levels.
There are many factors that restrict the efficacy of the therapeutic proposal based on biomarkers:
- There are some types of tumors where biomarkers are harder to identify due to a low recurrence of alterations (Sanchez-Vega et al. 2018) in canonical signaling pathways, indicating that many oncogenic mechanisms are still unknown and can involve other molecular alterations in addition to those commonly interrogated.
- The number of validated biomarkers is still limited and mostly restricted to tumor types. In addition, some frequent drivers such as KRAS or MYC remain untargetable (Sahai et al. 2017).
- The association between molecular alterations and drug responses that guide therapies usually takes into account the individual oncogenic alterations.
Nevertheless, a positive response to treatment is highly dependent on the context (Senft et al. 2017) and the effect of an alteration is variable depending on their co-occurrence with other events (Killock 2017; Mina et al. 2017). One of the reasons is that cancer is ultimately a pathway disease with all the attached complexity and bypass mechanisms. It is therefore essential to take into account the global picture when proposing therapies and to manage alterations in a systematic way.
- Tumoral heterogeneity complicates drug responses. Therapies that act on trunk alterations do not always guarantee success and it is frequent that after a good initial response, subsequently acquired resistances emerge.
Resistance can be genetic, as in the case of the adaptive rewiring of transduction signals, or non-genetic, such as epithelial-mesenchymal transition or histologic transformations. Sometimes, resistances are already
27
Introduction
present in the pretreatment tumoral context, as for instance the activating alterations in KRAS codon 12 that confer resistance to the treatment with EGFRi in colorectal cancer. These resistances can be anticipated by the detection of the associated biomarker and contraindicate the use of some compounds.
- The identification of treatments is fundamentally based on somatic alterations.
This has its limitations as germline variants are also crucial in drug metabolism, and therefore in drug effectiveness and toxicity (Menden et al.
2018).
These therapeutic limitations are linked to cancer molecular complexity, making it difficult to select the appropriate therapy from the results of high-throughput molecular testing. Just taking into consideration genomic alterations, incomplete knowledge, together with the variability of outcomes in tumor development and drug response, make them unmanageable without computational help. This could be worsened in the coming years with the increasing number of therapeutic alternatives and the adoption of exome and genome sequencing in routine clinical practice.
5. CANCER PHARMACOGENOMICS
Pharmacogenomics connects pharmacology and genomics to find out the effect of the genome on the drug action (Figure 3). Preclinical studies with in vitro and in vivo models are employed to generate hypotheses in this area. They are molecularly characterized biological frameworks such as cell lines, Patient Derived Xenograft models (PDX) or organoids, which are exposed to a drug intervention where the degree of response is measured. Broad initiatives in cell line exploration have analyzed and collected this type of information. This is the case for the Cancer Cell Line Encyclopedia (CCLE) (Barretina et al. 2012), which tested 24 compounds on around 500 of their genomically characterized cancer cell lines. The second version of the Cancer Therapeutics Response Portal (CTRP) (Rees et al. 2016) used 860 of these cell lines to test 481 targeted drugs. And another project, the Genomics of Drug Sensitivity in Cancer (GDSC) (Yang et al. 2013), explored the effect of around 265 anticancer drugs in more than 1000 cancer cell lines.
28
Introduction
PDX models make it possible to overcome the limitations of in vitro experiments, while preserving the architecture and tumor heterogeneity, to evaluate cancer evolution dynamics or response to treatments (Byrne et al. 2017). In this strategy, tumor samples are implanted and propagated in immunosuppressed mice and can be used to validate the hypotheses generated by genomic analyses, allowing different therapeutic options to be tested and the most appropriate one selected before being administered to the patient (Garralda et al. 2014).
Basket and umbrella clinical trials aim to take account of molecular features in the process of drug administration, allowing the validation and integration in clinical practice of the novel hypothesis. Basket trials evaluate the treatment response by assembling different tumor types sharing the same biomarkers, while umbrella trials subdivide the group with common biomarkers in subgroups with the same tumor type to perform the evaluation (Park et al. 2020). After passing the clinical trials, the patient outcomes of prescribed marketed drugs also provide pharmacogenomic information not detected in previous steps. In relation to this, data on the treatment response in patients genomically characterized in large consortia (e.g. ICGC, TCGA) could be used to explore how those genomic profiles affect the degree of response to the prescribed treatments.
Besides experimental approaches, artificial intelligence can be a shortcut for many discovery tasks. Machine learning algorithms can be also applied to discover new translational biomarkers such as those useful for therapeutic action and drug efficacy (Vamathevan et al. 2019).
6. BIOINFORMATICS TOOLS IN PRECISION ONCOLOGY
Bioinformatics is imperative in precision oncology (Gómez-López et al. 2019;
Carretero-Puche et al. 2020) contributing to the expansion of the current knowledge through the integration of several layers of data, and the implementation of systematic methodology to search for new associations and biomarkers. It also allows the storage and handling of massive omics data coming from individual
29
Introduction
molecular characterization analyses, and helps in their interpretation in order to identify the alterations that guide patient management and therapeutic selection (Figure 3).
Figure 3. Stages and components in the discovery and application of pharmacogenomic knowledge. From preclinical models to patient management, the combined information about the drug response and the molecular background of the individual feeds pharmacogenomic discovery, helped in the process by the support of bioinformatics resources.
Bioinformatics tools to interpret genomic data for precision oncology should have two main functions (Mardis 2018): (i) to identify known functional alterations in cancer and interpret variants of unknown significance, and (ii) to add to these findings associated clinical information to identify therapeutic options.
6.1. Biomedical knowledge bases
One of the pillars for precision oncology is the availability in public data repositories of acquired knowledge about variants, drugs and their interconnections. Multiple
30
Introduction
databases and resources have been created to collect the different types of findings in each knowledge area. Some examples are those that store the impact of the variation on cancer (Tate et al. 2019); its clinical repercussions (Landrum et al.
2018); its clinico-therapeutic implications (Chakravarty et al. 2017; Griffith et al.
2017; Tamborero et al. 2018); or the existing associations between genes (Kanehisa 2019), between drugs (Liu et al. 2020) or between genes and drugs (Cotto et al.
2017).
Information stored in biomedical databases and resources has different levels of evidence and is very heterogeneous, not only in its content, but also in its layout.
This heterogeneity and the absence of standardization of formats and terminology represents one of the difficulties faced by the processes that need to integrate their content.
6.2. Prioritization of alterations
The prioritization and interpretation of the alterations identified in screening analysis is one of the big challenges of genomics-driven precision oncology (Hyman et al.
2017). The results obtained in genetic tests, especially in those spanning a broader genomic region, are complex to interpret because the number of detected variants can be very high, most of them being of unknown significance. Furthermore, the biological relevance of a variant is not always accompanied by its therapeutic implications. For this reason, it is necessary to collect many types of evidence that give clues about the functional repercussions of the variant, and to associate this information with therapeutic options in an efficient way (Mardis 2018).
This need has fostered the development of methodologies and bioinformatic tools with the objective of making variant interpretation easier in cancer genomics. One such tool is described by Perera-Bel and colleagues (Perera-Bel et al. 2018). They rely on annotations that provide layers of information about their known and inferred biological and clinical effects. There are many tools that integrate annotations from different resources and databases (Wang et al. 2010; McLaren et al. 2016). The integrated annotations such as genomic location, sequence and functional consequences or population frequency allow the establishment of a rational criteria
31
Introduction
to prioritize variants that can use different strategies, the most common being stratification into tiers (Li et al. 2017).
6.3. In silico drug prescription
In silico drug prescription methodologies use the molecular data to identify appropriate therapies. Most of them use DNA variants as input, such as mTCTScan (M. J. Li et al. 2017) or the Cancer Genome Interpreter (CGI) (Tamborero et al.
2018) although some are starting to integrate other layers of omics information (Kalari et al. 2018). They can be applied in individual cases to guide therapy prescription, but also on a large scale to obtain a general view and observe trends in the tumoral therapeutic landscape (Rubio-Perez et al. 2015). Consortia integrated in the ICGC provide a large amount of molecular data, from raw to processed files available for the scientific community where these approaches can be systematically applied.
The precise evaluation of the in silico prescription capabilities can only be made by subsequent in vivo human application, infeasible because of the multiple alternatives. This limitation on testing can be circumvented by in vivo models, and also by retrospective analysis in treated patients with molecular characterization like those included in TCGA with therapeutic responses in their clinical records. Clinical data include information about provided therapies, treatment time, therapy response, overall survival or cause of death. Nevertheless, clinical data regarding therapies has remained inaccessible for most cases (Uhlen et al. 2017) and there is a lack of standardization. This situation has recently started to change with initiatives like the NCI Genomic Data Commons (GDC) (Grossman et al. 2016) that has collected and processed clinical data for TCGA and other projects, making it available for the scientific community.
32
OBJECTIVES
33
Objectives
The main objective of this thesis is the development and implementation of a methodology for in silico therapeutical prescription using individual genomic data.
The specific objectives are:
1. The collection and integration in a database of drug-gene associations and related annotations useful to define a search space for therapeutic intervention.
2. The definition of a therapy search logic based on the genomic alterations identified in an individual.
3. The establishment of a prioritization system for the genomic alterations based on the addition and scoring of annotations and predictions about biological and oncological relevance.
4. The establishment of a prioritization system of the therapies linked to the detected alterations that reflects its appropriateness in the particular genomic context.
5. The implementation of a web platform with the developed methodology that allows its execution and the graphical representation and exploration of the results.
6. The exploration of the therapeutic scope of the methodology in large cohorts and its performance in individual cases.
7. The proof of concept of its capacity to predict the outcome of a treatment through the evaluation of the clinical data attached to the patient cohorts.
35
MATERIALS AND METHODS AND RESULTS
37