P rograma de Doctorado en Biociencias Moleculares
Computational approaches for the identification of putative therapeutic targets for pancreatic ductal
adenocarcinoma
Javier Perales Patón
Madrid, 2019
Facultad de Medicina
Universidad Autónoma de Madrid
Computational approaches for the identification of putative therapeutic targets for pancreatic ductal
adenocarcinoma
Tesis doctoral que presenta para optar al grado de doctor en Biociencias Moleculares por la Universidad Autónoma de Madrid, el Licenciado en Biología
Javier Perales Patón
Co-Directores de Tesis:
Dra. Fátima Al-Shahrour Núñez Prof. Dr. Alfonso Valencia Herrera
Centro Nacional de Investigaciones Oncológicas
Dr Fátima Al-Shahrour Núñez, Head of the Bioinformatics Unit (Spanish National Cancer Research Centre, Madrid, Spain) and Dr Alfonso Valencia Herrera, Head of the Computational Biology Life Sciences Group (Barcelona Supercomputing Center, Barcelona, Spain)
CERTIFY:
That Mr. Javier Perales Patón, Graduated in Biology from the Complutense University of Madrid (Madrid, Spain), has completed his Doctoral Thesis entitled
“Computational approaches for the identification of putative therapeutic targets for pancreatic ductal adenocarcinoma” under our supervision and meets the necessary requirements to obtain the PhD degree in Molecular Biosciences. To this purpose, he will defend his doctoral thesis at the Universidad Autónoma de Madrid. We hereby authorize its defense in front of the appropriate Thesis evaluation panel.
We issue this certificate in Madrid on September 19th 2019
Fátima Al-Shahrour Núñez Thesis Director
Alfonso Valencia Herrera Thesis Director
En primer lugar, me gustaría agradecer a Fátima Al-Shahrour - mi directora de tesis, por la confianza que ha depositado en mí. Su esfuerzo y dedicación han sido fundamentales para mi desarrollo como investigador independiente y el trabajo que presento en este libro. Su generosidad, paciencia y buen hacer representan una escuela que pienso llevar siempre conmigo y que voy a continuar allá donde vaya. Me gustaría agradecer a Alfonso Valencia - mi director de tesis, la gran oportunidad que me ha brindado para iniciarme en el mundo de la Biología Computacional. Su gran labor difundiendo el conocimiento en el campo de la Bioinformática, y su inestimable ayuda y supervisión durante estos años de doctorado.
Muchas gracias a Héctor, quien me ha influido enormemente y me ha enseñado a construir un pensamiento crítico en ciencia. A Elena, por su generosidad, y por todo lo que hemos aprendido juntos desde el inicio. A Gonzalo y Osvaldo, por lo que representan para todos aquellos que empezamos, su dedicación en que todos aprendamos, y el buen rollo que mantienen en el grupo. A Michael, Tomás y José María, por maravillarme con todo lo que saben, de quienes he tenido la suerte de aprender tanto. A Kevin, Coral, Guillermo, Carlos, Piti, Laura y Fernando, por representar una generación de bioinformáticos de la que podemos estar orgullosos, y por lo grandes que son. A antiguos compañeros de laboratorio - Juan, María, Andrés, José Manuel, Miguel y David de Juan; por compartir parte de este camino conmigo. Al antiguo grupo de investigación de tumores gastrointestinales - Spas, Pedro y Manuel Hidalgo; por la oportunidad en trabajar con vosotros.
Un enorme agradecimiento para mi hermano - José María, y para mis padres - Vicenta y José María. Por estar siempre ahí y brindarme todo vuestro apoyo personal. Por vuestra confianza, y por inculcarme los valores de trabajo y educación que tengo a día de hoy. A todos mis amigos que conservo desde el colegio - Rubén, Enrique, Raúl, Miguel Ángel, José Luis y Carlos. Por último, me gustaría agradecer a Heleen su gran apoyo y paciencia mientras elaboraba esta tesis. Por su cariño y todo su amor, que tanto me ha levantado el ánimo para lograr este objetivo. Y por el gran futuro que nos espera juntos.
En el último año y medio, he viajado mucho a lo largo de Alemania, Holanda y España para conciliar mi vida personal con mi pasión que es la ciencia. Esto ha sido posible gracias al apoyo de muchas personas que describo en estas líneas y al de mucha otra gente. Gracias por hacerlo posible. ¡Gracias a todos!
La investigación contra el cáncer ha conseguido grandes avances en el desarrollo de nuevos tratamientos con mayor eficacia y especificidad. Sin embargo, algunos tipos de cáncer como el adenocarcinoma ductal de páncreas (PDAC, de sus siglas en inglés) todavía presentan dificultades para establecer regímenes farmacológicos efectivos. La biología computacional ha definido el paisaje de mutaciones accionables por fármacos y subtipos de tumores de PDAC con una potencial respuesta diferencial a tratamientos. Los últimos datos de genómica funcional y farmacogenómica en líneas celulares de cáncer ofrecen la oportunidad de identificar nuevas vulnerabilidades terapéuticas.
El objetivo principal de esta tesis es la identificación de nuevas dianas con potencial terapéutico en PDAC mediante aproximaciones computacionales. Para ello, el transcriptoma de tumores primarios se diseccionó usando métodos computacionales. Primero, métodos de deconvolución permitieron separar la señal transcripcional de los distintos tipos celulares en el tejido de páncreas. Esto nos ha permitido seleccionar aquellas muestras que son representativas de la enfermedad, y redefinir los subtipos de tumores. Segundo, el agrupamiento jerarquizado de las líneas celulares de cáncer de páncreas junto con los tumores primarios permitió trasladar los datos farmacológicos de sensibilidad a los subtipos de tumores encontrados en los pacientes. De esta manera hemos definido el paisaje de sensibilidad diferencial entre los subtipos para una posible estratificación de los pacientes.
Más allá de tratamientos contra el tumor primario, la metástasis representa otra diana terapéutica complementaria ya que es la causa principal de muerte en pacientes. En este contexto, hemos realizado un estudio piloto de medicina personalizada en un tumor altamente metastásico. Hemos secuenciado el transcriptoma de células individuales de un xenoinjerto derivado del tumor de un paciente para caracterizar los programas transcripcionales que gobiernan la metástasis y proponer fármacos que la interrumpan. El reposicionamiento de fármacos basado en la reversión del fenotipo transcripcional nos ha permitido proponer varios inhibidores como posibles tratamientos anti-metastásicos, que más tarde han sido validado en el propio modelo experimental. Finalmente, para hacer accesible los últimos datos de dependencias genéticas en cáncer y su investigación traslacional en terapia contra el cáncer, hemos desarrollado un nuevo método llamado vulcanSpot. VulcanSpot integra los datos masivos de dependencias genéticas con métodos computacionales de prescripción de fármacos utilizando asociaciones de fármacos conocidos y reposicionados para destapar posibles vulnerabilidades terapéuticas en cáncer.
Cancer research has made great advances in developing novel treatments with higher efficacy and specificity. However, some cancer types such as pancreatic ductal adenocarcinoma (PDAC) remain a challenge for the establishment of effective pharmacological regimens. Computational biology has been used to describe the actionable mutational landscape and to discover tumour subtypes in PDAC with potentially differing drug responses. The recently released large-scale functional genomic and pharmacogenomic screenings in cancer cell lines bring the opportunity to uncover novel therapeutic vulnerabilities.
The main objective of this thesis is to identify novel therapeutic targets for PDAC using computational approaches. To this end, the transcriptome of primary tumours was dissected using computational methods. First, deconvolution methods were used to separate the transcriptional signal derived from distinct cell types from pancreas tissue. This unlocks the possibility to select those samples that better represent PDAC, and to redefine the tumour subtypes. Second, hierarchical clustering of pancreatic cancer cell lines with primary tumours enabled the translation of drug sensitivity data from cell lines to the tumour subtypes found in patients. Doing so, we have studied the differing drug sensitivity landscape between tumour subtypes for a potential stratification of the patients. Beyond primary tumours, metastasis represents a complementary therapeutic target since it is the main cause of cancer death in patients. In this context, we have carried out a pilot study of precision medicine on a PDAC tumour with high metastatic capacities. We have sequenced the transcriptome of single cells to characterize the main transcriptional programmes governing cell migration and tumour spread, and to computationally prescribe candidate drugs to block metastasis. Drug repositioning approaches based on the reversion of the transcriptional malignant phenotype allowed us to propose several inhibitors as potential anti-metastatic treatments, which were subsequently validated in the experimental model.
Finally, to make accessible the latest cancer gene dependency data and its research translation in cancer therapeutics, we have developed a novel method named vulcanSpot.
VulcanSpot integrates big data of gene dependencies with computational drug prescription methods based on known and repurposed drugs associations to uncover therapeutic vulnerabilities in cancer.
List of Figures List of Tables Abbreviations
1 Introduction 1
1.1 Cancer biology 1
1.1.1 Significance 1
1.1.2 Hallmarks of cancer 1
1.1.3 The cancer genome 1
1.1.4 Therapeutic intervention 3
1.1.5 Genetic dependencies 4
1.1.6 Disease models 5
1.2 Pancreatic ductal adenocarcinoma (PDAC) 6
1.2.1 Significance 6
1.2.2 The pancreas and its neoplasms 6
1.2.3 Cellular origin 8
1.2.4 Development and progression 8
1.2.5 Metastatic disease 9
1.2.6 Pancreatic cancer therapy 10
1.3 Computational Biology in cancer 11
1.3.1 Role in cancer research 11
1.3.2 Tumour molecular profiling 12
1.3.3 Methodologies for molecular data analysis 13
1.3.4 Gene expression signatures and signature matching 14
1.4 Source for large collections of tumour profiles 14
1.4.1 International cancer consortiums 14
1.4.2 Public repositories of gene expression profiles 15
1.4.3 Functional genomics and pharmacogenomics screenings in cancer cell lines 15
1.5 Translational Bioinformatics in PDAC 16
1.5.1 Defining the mutational landscape 16
1.5.2 PDAC subtypes: molecular classes 17
1.5.3 Genomics-driven precision medicine 18
1.6 Computational drug prescription 20
1.6.1 In silico drug prescription based on individualized cancer genomics 20
1.6.2 Computational drug repositioning in cancer 21
2 Objectives 23
3.2 Molecular profiles of Cancer Cell Lines 26
3.2.1 Genomic alterations and basal gene expression profiles 27
3.2.3 Catalogue of transcriptional signatures of cellular perturbations 27 3.2.4 Extraction of consensus transcriptional signatures of cellular perturbations 28
3.2.5 Gene Dependency 29
3.2.6 Pharmacological Sensitivity of cancer cell growth inhibition 31 3.3 Single-cell RNAseq from a highly metastatic PDX model of PDAC 33
3.4 Gene Expression Analyses: methodology 32
3.4.1 High dimensionality reduction methods 32
3.4.2 Deconvolution of bulk tissue transcriptome profiles 33
3.4.3 Genome-wide differential gene expression and signature extraction 33
3.4.4 Functional analysis 34
3.4.5 External gene expression signatures related to PDAC 35
3.4.6 Tumour subtyping of cancer cell lines 36
3.5 Drug efficacy studies for the anti-metastatic treatment and experimental assays on the characterization of Survivin protein 36
3.6 Computational selection of treatments 37
3.6.1 Known direct drug-gene interactions powered by PanDrugs 38
3.6.2 Transcriptome signature reversion 38
3.6.3 KDCP approach: inferring chemical compounds with potential gene depletion
effects 39
Pairwise signature similarity calculation between gene knock-down and drug
cellular perturbations 39
Adjustment by closeness between the compound and the gene target 40
4 Results 43
4.1 Computational drug repositioning for the treatment of primary tumours in
PDAC 43
4.1.1 Computational deconvolution of pancreatic cell types in bulk tissue gene
expression profiles 43
4.1.2 Extraction of a transcriptional signature representative of the disease 47
4.1.3 Treatment for a reversion of PDAC 48
4.2 The drug sensitivity landscape of PDAC subtypes 49
4.2.1 Refining previously reported PDAC tumour subtypes 49
4.2.2 Classification of PDAC cancer cell lines into tumour subtypes 51 4.2.3 Evaluation of differing drug response to anti-cancer single agents between
tumour subtypes 55
4.3 Transcriptional dissection of cancer cell spread for an anti-metastatic
treatment in PDAC using single-cell RNAseq 58
4.3.1 Patterns of transcriptional heterogeneity during metastasis 58 4.3.2 Transcriptional reprogramming during cell migration and metastasis 60
4.4 vulcanSpot: a tool to prioritize therapeutic vulnerabilities in cancer 67
4.4.1 Development of the method 67
Step 1. Identification of GDs in tumor-specific genotype contexts 68
Step 2. In silico drug prescription for vulnerable GDs 70
Step 3. Therapeutic prioritization 70
4.4.2 Broad statistics on therapeutic vulnerabilities 71
4.4.3 Recall of canonical cancer gene dependencies 72
4.4.4 Web service and access to the database 73
5 Discussion 75
5.1 Transcriptional dissection of primary tumours for PDAC therapy 75 5.1.1 Deconvolutional methods for the correction of non-tumoral cell contamination in
PDAC transcriptome analysis 75
5.1.2 Stratification of PDAC RNA subtypes for PDAC therapy 77
5.1.3 Computational selection of drugs for transcriptome reversion of PDAC
phenotypes 80
5.2 Single cell transcriptomics of cells involved in metastasis from a highly
metastatic PDX model of PDAC 82
5.2.1 Transcriptional characterization of cell populations during metastasis 82 5.2.2 Computational selection and efficacy of drug candidates for an anti-metastatic
treatment in PDAC 84
5.3 VulcanSpot: exploiting the notion of gene dependencies for cancer treatment 86
5.3.1 Fundamentals and scope of vulcanSpot 86
5.3.2 Identification of gene dependencies: strengths, pitfalls and future directions 88 5.3.3 Computational drug prescription: filling the gap between the vulnerability and
the clinical utility 89
5.4 Future perspectives 90
6 Conclusions 93
Conclusiones 95
References 97
Annex I : Supplementary material 109
Annex II : Scientific production 131
Conferences 131
Publications 133
Figure 1 | Accumulation of somatic mutations within the cell lineage that originates the
cancer. 2
Figure 2 | Canonical examples of targeted therapies on cancer hallmarks. 4
Figure 3 | Depiction of cancer gene dependencies. 5
Figure 4 | Depiction of the pancreas. 7
Figure 5 | Model of the progression from a normal cell to PanIN-3, the precursor of PDAC. 9 Figure 6 | The Invasion-Metastasis cascade. Carcinoma cells exit the primary site through a local invasion and intravasation to the circulatory or lymphatic systems. 10 Figure 7 | Core of signaling pathways recurrently altered in PDAC. 16 Figure 8 | Personalized cancer medicine using avatar models and Translational
Bioinformatics analysis. 20
Figure 9 | The Connectivity Map (CMap) approach. 21
Figure 10 | Heatmap representing the estimated cell fractions for the 6 main cell types by BSeq-sc from the pancreas tissue across samples from the collection of data sets. 44 Figure 11 | Impact of the presence of acinar cells in the patient samples on the variability of
the gene expression profiles. 45
Figure 12 | Consistency between transcriptional signatures of PDAC tumours. 48 Figure 13 | Aggregated ranking of compounds with opposite connections to the three PDAC
signatures. 49
Figure 14 | Correlation matrix between the original transcriptional signatures that define
previously described PDAC subtypes and other confounding sources. 50
Figure 15 | Subtyping of cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE)
using patients’ data from Collisson et al. 2011. 53
Figure 16 | Subtyping of cancer cell lines from the Genomics Drug Sensitivity in Cancer
(GDSC) using patients’ data from Puleo and Nicolle et al. 2018. 54
Figure 17 | Mosaic plot illustrating the independence between tumour subtype and treatment outcome with gemcitabine in patients with PDAC (TCGA-PAAD cohort). 55 Figure 18 | Hypothesis testing of the differing drug sensitivity for the approved drugs for
PDAC treatment, Gemcitabine and Erlotinib. 56
Figure 19 | Drug sensitivity landscape of PDAC subtypes. Screening-wide drug response using the pharmacological data from the GDSC (left) and CCLE (right). 57
Figure 21 | Differential gene expression between tumour cell populations. 61 Figure 22 | Single-cell transcriptional programmes. 62 Figure 23 | Functional characterization of BIRC5 in PDAC. (a) log2-transformed Normalized
BIRC5 gene expression across cell populations. 63
Figure 24 | Pre-ranked GSEA using kinase substrates of AURKA, AURKB and PLK1 of CTC
as compared to primary tumour cells . 65
Figure 25 | vulcanSpot workflow. 68
Figure 26 | Types of Gene Dependencies (GDs) identified for the method. 69 Figure 27 | Rational criteria to rank vulcanSpot results. 71 Figure 28 | Statistics on total and shared GDs across contexts. 72 Supplementary Figure 1 | In-vivo drug efficacy studies for an anti-metastatic treatment in
Panc-265 PDX model. 124
Supplementary Figure 2 | Molecular characterization of BIRC5 in Panc-265 PDX model 125 Supplementary Figure 3 | Depiction of the identification of cancer gene dependencies. 126 Supplementary Figure 4 | Volcano plot of the direction of enrichment versus the
significance from the identification of gene dependencies. 127
Supplementary Figure 5 | KDCP score distribution across cancer contexts.. 127 Supplementary Figure 6 | Validated examples of therapies to target known genetic
dependencies proposed by vulcanspot. 128
List of Tables
Table 1 | Overview of the most frequent malignant pancreatic neoplasms. 7 Table 2 | Key transcriptional programmes that characterize the distinct tumour subtypes
described in the literature. 18
Table 3 | Summary of number of samples from the collection of data sets. 26 Table 4 | List of external transcriptional signatures related to tumour cell microdissection and
tumour subtypes of PDAC. 35
Table 5 | Data sets with normal pancreas samples and the number of samples that pass the
criteria of ductal-like expression. 46
Table 6 | Summary of data sets (or aggregated data sets) used for the PDAC tumour vs.
normal pancreas comparison. 47
Table 7 | Datasets used to cluster cancer cell lines with primary tumours of patients with
PDAC. 51
Table 8 | CMap statistics on the anti-apoptotic state of CTCs as compared to primary tumour
cells. 64
Table 9 | Pre-ranked GSEA outcome using AURKA, AURKB, PLK1 knock-down signatures
obtained from CMap-L1000 Phase I dataset. 65
Supplementary Table 1 | Compendium of dataset with tumours from patients with PDAC
obtained from public repositories. 110
Supplementary Table 2 | Datasets with experimental samples of PDAC. 111 Supplementary Table 3 | Resources integrated in vulcanSpot for the identification and drug
prescription of therapeutic cancer genetic vulnerabilities. 112
Supplementary Table 4 | Cell lineages defined by the CCLE and DepMap projects (Broad Institute, Boston) and individual cell lines with perturbation experiments in LINCS1000. 113 Supplementary Table 5 | Number of consensus transcriptional signatures of drug and gene knock-down perturbations across different context considered in vulcanSpot. 115 Supplementary Table 6 | Drug connections obtained using the three transcriptional
signatures of PDAC 116
Supplementary Table 7 | Screening-wide differential drug response between subtypes
using the pharmacological data from the GDSC and CCLE. 118
Supplementary Table 8 | Differentially expressed genes in CTC as compared to PT 119 Supplementary Table 9 | Pre-ranked Gene Set Enrichment analysis in CTC as compared to
PT using the Hallmarks collection of gene sets. 121
Supplementary Table 11 | Number of features for statistical testing in Gene Dependencies
(GDs) 123
5-FU 5-fluorouracil
ADEX Aberrantly Differentiated Endocrine-eXocrine ADM Acinar-to-Ductal Metaplasia
CCLE Cancer Cell Line Encyclopedia
cDNA complementary DNA
CMap Connectivity Map
CNIO Centro Nacional de Investigaciones Oncológicas (Spain, Madrid) CRISPR Clustered Regularly Interspaced Short Palindromic Repeats CTC Circulating Tumour Cell(s)
CTRP Cancer Therapeutics Response Portal
DAB 3,30-diamniobenzidine
DAPI 4′,6-diamidino-2-phenylindole DepMap Cancer Dependency Map
DNA Deoxyribonucleic acid
ECM Extracellular matrix
EMBL-EBI European Bioinformatics Institute EMT Epithelial-to-Mesenchymal Transition
FDR False Discovery Rate
GAF Global Allele Frequency
GATK Genome Analysis Tool Kit
GD Gene Dependency
GDSC Genomics Drug Sensitivity in Cancer
GEM Genetically Engineered Mouse
GSM Genome-Scale Metabolic model
GEO Gene Expression Omnibus
GO Gene Ontology
GoF Gain-of-Function
GSEA Gene Set Enrichment Analysis GSVA Gene Set Variation Analysis
GTEx Genotype-Tissue Expression Project IC50 Half maximal inhibitory concentration ICGC International Cancer Genome Consortium
ISCIII Instituto de Salud Carlos III (Madrid, Spain)
HC Hierarchical clustering
KDCP Knock-Down~Compound Perturbation
KEGG Kyoto Encyclopedia of Genes and Genomes
KS Kolmogorov-Smirnov test
LDH Lactate Dehydrogenase
LINCS Library of Network-Based Cellular Signatures
LM Liver metastasis cells
LoF Loss-of-Function
miRNA Micro RNA
mRNA messenger RNA
MSigDB Molecular Signature Data Base
NCBI National Center for Biotechnology Information
OS Overall survival
PAGODA Pathway and Gene Set Overdispersion Analysis PanIN Pancreatic intraepithelial neoplasia
PanNET Pancreatic Neuroendocrine tumor
PBS Phosphate-buffered saline
PCA Principal Component Analysis
PDAC Pancreatic Ductal AdenoCarcinoma
PDX Patient-Derived Xenograft
PKC Protein Kinase C
PPI Protein-Protein Interaction
PT Primary tumour cells
QM Quasi-Mesenchymal
RMA Robust Multi-array Average
RNA Ribonucleic acid
RNAi RNA interference
RNAseq RNA sequencing
SCDE SingleCell Differential Expression
sgRNA single guide RNA (for CRISPR-Cas9 editing) shRNA short-hairpin RNA
SMAD4 SMAD Family Member 4
ssGSEA Single-Samples Gene Set Enrichment Analysis
TCGA The Cancer Genome Atlas
TMM Trimmed-Means of M-Values
UCSC University of California, Santa Cruz
UK United Kingdom
UPC Universal exPression Codes
USA United States of America
UV Ultraviolet light
vulcanSpot VULnerable CANcer Spot
WES Whole-Exome sequencing
WGCNA Weighted Gene Correlation Network Analysis
WGS Whole-Genome Sequencing
1 Introduction
1.1 Cancer biology
1.1.1 Significance
Cancer is a generic term for a heterogenous group of genetic diseases that arises in different parts of the body. Cancer result of a series of cellular changes that transform normal cells into cancerous cells. Cancerous cells are characterized by an uncontrolled growth, resistance to programmed cell death, and invasion beyond normal tissue boundaries to metastasize distant organs. The International Agency for Research on Cancer estimated 9.6 million deaths from cancer worldwide in 2018 (Ferlay et al. 2019). The disease is a global health concern. Cancer research aims to improve the diagnosis and treatment, and decipher the molecular mechanism driving the disease.
1.1.2 Hallmarks of cancer
The development of tumours comprehends multiple steps by which the cancer cells hijack cellular processes that are present in normal cells in order to become malignant. These capabilities were termed cancer hallmarks . Cancer hallmarks include sustained proliferative signaling, evasion of growth suppression, active invasion and metastasis, immortal replication, induction of angiogenesis, resistance to cell death, genome instability and metabolic reprogramming (D. Hanahan and Weinberg 2000; Douglas Hanahan and Weinberg 2011). The tumour interaction with non-tumoral cells also display more complex capabilities from which the tumour takes advantage from such as tissue inflammation and avoiding tumour immunosuppression (Douglas Hanahan and Weinberg 2011). The collection of these general processes composes the pathogenic behaviour of the disease. These malignant mechanisms are driven by the acquisition of somatic mutations and other genetic alterations on molecular key players that mediate the aberrant cellular mechanisms.
1.1.3 The cancer genome
Cancer is an accelerated evolutionary process at the cell lineage level. Cancer arises as a result of an accumulation of changes that occurred in the DNA sequence that increase its cellular fitness and pathogenesis. These mutations are accumulated in the lineage of cell
divisions from the fertilized egg to adulthood. Spontaneous and induced mutagenesis processes such as intracellular machinery errors, or the exposure to certain environmental agents (e.g. UV light or tobacco) damage the DNA of the cells. Consequently, the function of certain important genes is altered or impaired. The precursor of cancer cell eventually might acquire novel capacities that promote the inception of the disease. Not all the somatic mutations are involved in the development of cancer, but the combination of some of them in particular are going to promote tumorigenesis. In this regard, passenger mutations are those originated from the tumorigenic process, without a functional and direct impact in oncogenesis. In contrast driver mutations are causally implicated in oncogenesis. Maybe a driver mutation might not be absolutely required in the final stage of the cancer, but it must have been selected at some point because of its implication during the development of the cancer. The combination of several driver mutations affecting certain key molecular players are required for cancer inception (Figure 1).
Figure 1 | Accumulation of somatic mutations within the cell lineage that originates the cancer. Normal cell accumulates mutations due to both the intrinsic mutation and exposure to mutagenic agents from the environment. Passenger mutations do not have any effect, whereas driver mutations contribute to the clonal expansion. Eventually, mutations impaired intracellular processes such as DNA repair, which contributes to accelerate the acquisition of mutations (mutator phenotype). The exposition to chemotherapy induces mutations, eventually the cancer cell acquire resistance to the treatment. Figure taken from (Stratton, Campbell, and Futreal 2009)
Driver mutations occur mainly in canonical cancer genes. These include oncogenes and tumour suppressor genes. Oncogenes are genes with the dominant potential of cause cancer when they are active (acquired by mutation or overexpression). Most normal cells
would undergo cell death and tumour suppression when an oncogene is active, but cancer precursor cells have accumulated DNA defects which impaired other genes called tumour suppressor genes that are involved in controlling such key processes. Therefore, oncogenesis requires the combination of genetic alterations affecting both facets: promote oncogenesis and disrupt native tumour suppression mechanism from the cell.
1.1.4 Therapeutic intervention
Cancer is a short-term deadly malignancy. Thus, time to intervention play a major role in the treatment outcome. Early diagnosis improves the prognosis. When possible, surgery resection of the localized tumour is one of the most effective treatments. The administration of chemotherapy (i.e. cytotoxic agents) and/or radiotherapy has been shown to be effective against cancer. These include agents that block DNA synthesis (alkylating agents such as cisplatin), interfere the cell cycle with anti-metabolites (e.g. pyrimidine antagonist such as gemcitabine or 5-FU), or destabilize microtubule assembly (e.g. taxanes such as paclitaxel).
Their effectivity is based on the fact that these agents disrupt cell growth, thus dividing cells such as cancer cells are sensitive to them. However other dividing cells such as normal epithelial cells are sensitive as well, resulting in side effects for the patient. In opposite, targeted therapies aims to selectively target the cancer hallmarks that sustained the cancer cells (see Section 1.1.2 and Figure 2), resulting in less toxicity for the patients. Many of these cancer drugs target the proteins encoded by oncogenes, or recover the impaired function of tumour suppressor genes.
Targeted therapies have revolutionized cancer treatment, setting the basis for precision medicine in the clinical settings. Precision medicine aims to tailor the patient treatment based on the individual characteristics of the patient’s tumour. Thus, the practice of precision medicine aims to select the treatment that is going to obtain the best treatment outcome as possible in an individualized manner.
Figure 2 | Canonical examples of targeted therapies on cancer hallmarks . The inner circle represents the cancer hallmarks, whereas the outer circle the rationale designed targeted therapies against them. Figure taken from (Douglas Hanahan and Weinberg 2011).
1.1.5 Genetic dependencies
The innate mutagenic evolution of tumours promotes the progression and survival of cancer cells, but also uncovers collateral gene dependencies (GDs) such as oncogenic addictions and synthetic lethal genes (Fece de la Cruz, Gapp, and Nijman 2015) (Figure 3). These collateral gene dependencies could be exploited to extend the current catalogue of molecularly matched treatments. Cancer research is actively seeking for these latent therapeutic vulnerabilities.
Figure 3 | Depiction of cancer gene dependencies. (A) Oncogenic addictions are dependencies of the active forms of oncogenes and downstream signaling. (B) Synthetic lethal are relationships between two genes where the single inactivation of each gene has no effect in cell viability, but the co-inactivation of both is lethal for the cancer cell. Figure modified from (Fece de la Cruz, Gapp, and Nijman 2015)
1.1.6 Disease models
Different models that represent the cancer disease have been developed to be used as an experimental and preclinical platform to detangle the cellular mechanisms governing the disease. One of the mostly used models are cancer cell lines. Cancer cell lines are established cell lineages from a tumour through multiples passes growing with a medium culture in a petri dish. The catalogue of cancer cell lines available is comprehensive and diverse at the molecular level (Barretina et al. 2012), the representation of the tumour microenvironment in these models is limited though. Animals models such as Genetically Engineered Mouse (GEM) models and Patient-Derived Xenograft (PDX) models were developed to better recapitulate the tumour microenvironment and the interaction with the host body. GEMs are models that carry important cancer mutations to be active in a specific tissue of interest triggered by a reporter, thus GEMs are a useful system to study tumour initiation and the role of specific cancer genes. PDX models are engrafted human tumours in immunocompromised mice. PDX models preserve the complexity and intra-tumour heterogeneity from the original human primary tumour. Additionally, the murine host in the PDX model mimics closely the original patient body as compared to in vitro cell lines, but a higher cost of maintenance (Kopetz, Lemos, and Powis 2012).
1.2 Pancreatic ductal adenocarcinoma (PDAC)
1.2.1 Significance
Pancreatic Ductal Adenocarcinoma (PDAC) is an exocrine cancer derived from the pancreas. PDAC represents the most common form of pancreatic neoplasms, accounting for the 90% of all cases. PDAC is a lethal disease with less than 8.2% of 5-year overall survival (Wu et al. 2018). It represents a major cause of cancer-related deaths in Western countries.
Despite the low incidence, it is the fourth cause of cancer in USA, and it is projected to become the second cause by 2030 (Rahib et al. 2014). In the year 2019, an estimated 56,770 patients will be diagnosed with PDAC, and 45,750 patients will die due to complications in the USA (Siegel, Miller, and Jemal 2019). The close parallelism between incidence and mortality reflects the fatality of pancreatic cancer. The lethality of PDAC stems from two main reasons. The first stages of pancreatic cancer are usually asymptomatic;
thus, most patients are diagnosed with an advanced stage. Second, there is a lack of effective treatments (Garrido-Laguna and Hidalgo 2015). Cancer research in PDAC aims to improve diagnostics and identify more efficient treatments.
1.2.2 The pancreas and its neoplasms
The pancreas is a complex organ with two major physiological functions: the secretion of hormones (endocrine) and digestive enzymes (exocrine) (Figure 4). Four critical cell lineages coordinate these functions. Pancreatic islets (endocrine cells) comprised by alpha and beta cells secrete glucagon and insulin respectively, regulating blood sugar and the systemic cell metabolism. The exocrine compartment of the pancreas is comprised by the other two cell lineages: acinar and ductal cells. Acinar cells secrete digestive enzymes such as trypsin, chymotrypsin, amylase and lipase; whereas ductal cells secrete bicarbonate.
Ductal cells are histologically organized to build a complex network of ducts that collect and conduct the digestive enzymes to the second part of the duodenum, and represent the 10%
of the cell population in the pancreas (Neoptolemos et al. 2010).
Figure 4 | Depiction of the pancreas. It is shown the anatomical localization and the composition of the pancreatic tissue.
Neoplasms of the pancreas are classified based on the cellular lineage that they arise from, which determine the biological and pathological features of the tumour. The main neoplasms of the pancreas are mentioned in Table 1. Exocrine tumours arise from acinar and ductal cells. Although acinar cells represent the most abundant cell population in the pancreas, acinar carcinoma is very uncommon (2%). In contrast, pancreatic ductal adenocarcinoma is the most common and aggressive form of pancreatic cancer (90%). Endocrine tumours are less frequent, and present different histopathological characteristics to exocrine tumours.
Histological type Type (origin) Freq. Key features Pancreatic ductal
adenocarcinoma (PDAC)
Exocrine (ductal cells)
90% Epithelial tumour with ductal-like characteristics.
The most aggressive form of pancreatic cancer (5-year OS less than 8%).
Pancreatic
Neuroendocrine tumor (PanNET)
Endocrine (islet cells)
5% Neuroendocrine differentiation with over-expression of synaptophysin and chromogranin (hormone production). Malignant but less aggressive than ductal adenocarcinoma (5-year OS of 42%).
Solid-pseudopapillary neoplasm
Uncertain 1-2% Very rare case of cystic neoplasm.
More frequent in women (90%). Malignant, but with a good prognosis (5-year OS of 95%).
Acinar carcinoma Exocrine (acinar)
1-2% High exocrine enzyme production, aggressive (5-year OS of 45%).
Pancreatoblastoma Exocrine (acinar)
<1% Associated with Beckwith-Wiedemann syndrome.
Mainly presented in childhood (5-year OS of 50%).
(footer on next page)
Table 1 | Overview of the most frequent malignant pancreatic neoplasms. OS: Overall Survival. (Iacobuzio-Donahue et al. 2012; Neoptolemos et al. 2010; Law et al. 2014;
Halfdanarson et al. 2008; Lowery et al. 2011; Dhebri et al. 2004).
1.2.3 Cellular origin
The cellular origin of pancreatic ductal adenocarcinoma is controversial. The fact that pancreatic cancer cells and their precursors present a ductal phenotype lead to the assumption that the origin of pancreatic ductal adenocarcinoma was the malignant transformation of ductal cells. However, it was described later that acinar cells have the potential to transdifferentiate phenotypically to ductal-like cells by certain stimuli such as pancreatic inflammation. This transdifferentiation is necessary after tissue injury to regenerate the damaged pancreatic tissue. In the context of carcinogenesis, this cellular transformation was termed Acinar-to-Ductal Metaplasia (ADM). ADM was extensively studied in GEM models (Guerra et al. 2007). In these studies, it was shown that the concomitant overactivation of KRAS with tissue inflammation triggers an irreversible process resulting in the appearance of metastatic duct lesions characterized by ductal-like features from acinar inception. Therefore, the plasticity of acinar cells could also be involved in the origin of PDAC. Both ductal and acinar-derived metaplasias lead to the development of PanIN precursor lesions afterwards, that are fully described in the next section.
1.2.4 Development and progression
The model of development of PDAC was established almost 20 years ago (Figure 5) (R. H.
Hruban et al. 2000). PDAC evolves from a well-defined microscopic precursor lesion named pancreatic intraepithelial neoplasias (PanINs). PanINs are pre-invasive neoplasms that arise within the intralobular ducts of exocrine pancreas tissue and are classified in three progressive stages based on aberrant cytological features. In this model, the malignant transformation of normal epithelial cells resulting in PanIN are driven by a hallmark of genetic alterations. The transformation is triggered by the acquisition of the oncogene mutation in KRAS, followed by the inactivation of the tumour suppressor genes CDKN2A, SMAD4 and TP53. The first stage involves PanIN1A (flat-form type) and PanIN1B (micropapillary type) showing low grade of dysplasia. PanIN2 show cell enlargement and loss of polarity, nuclear crowding and hyperchromasia. PanIN3 presents an advanced lesion with cell budding invading intralobular ducts, with severe nuclear atypia and a highly proliferative phenotype. At this third stage, the PanIN3 lesion has the highest capacity of progressing to cancer (Ralph H. Hruban et al. 2004). Advanced PanIN lesions acquire
increasing genetic variability and instability, a proliferative phenotype, and eventually acquire the ability to invade adjacent tissue and migrate to metastasize distant organs.
Figure 5 | Model of the progression from a normal cell to PanIN-3, the precursor of PDAC . The oncogenic activation of KRAS and overexpression of ERBB2 occurs early, the inactivation of CDKN2A gene at an intermediate stage, and the inactivation of TP53, SMAD4 and BRCA2 in late stages. Figure adapted from (R. H. Hruban et al. 2000).
The development of a high desmoplastic stromal reaction surrounding the lesion is characteristic of PDAC progression. The dense stroma shrinks the tumour vasculature, which limits nutrient supply and oxygen, but also blocks drug delivery to the malignant cells.
1.2.5 Metastatic disease
The first stages of pancreatic cancer are usually asymptomatic. Clinical and basic research have established that metastasis occurs during the first stages (Yachida et al. 2010). As a result, up to 60% of patients present with metastatic disease at the time of the diagnosis (Gillen et al. 2010). Presently, the only effective treatment for PDAC is resection of the primary tumour. Even following pancreatectomy, between 30% and 60% of the patients suffer recurrence in distant organs after tumour resection (Groot et al. 2018). The metastasis to distant organs determines a likely incurable disease for the patient and leads to further complications and death. Therefore, the additional blockade of metastatic disease is a promising complementary approach to improve survival (Garrido-Laguna and Hidalgo 2015).
Metastasis occurs when cancer cells spread from the primary site to distant organs through the bloodstream or lymphatic system following different steps (Figure 6). Cancer cells must reprogramme to detach from cell-cell interactions, increase motility and invasiveness, adapt to new environments (rewiring of metabolism and signaling pathways), and avoid inter- and
intracellular checkpoints that prevent cell spread (Gillen et al. 2010; Valastyan and Weinberg 2011; Friedl and Alexander 2011; Lambert, Pattabiraman, and Weinberg 2017). Beyond these broad processes, our understanding on metastasis in pancreatic cancer and its potential anti-metastatic targets are relatively limited.
Circulating Tumour Cells (CTC) play a key role in cancer cell dissemination. CTC are cancer cells shed from the primary tumour into the bloodstream, the first step prior to colonization of distant organs and metastasis formation (Fidler 2003). CTC are currently on the focus of research in cancer therapy as a strategy to prevent metastasis, including in the context of pancreatic cancer (Gkountela et al. 2016).
Figure 6 | The Invasion-Metastasis cascade. Carcinoma cells exit the primary site through a local invasion and intravasation to the circulatory or lymphatic systems. Cancer cells must adapt to survive in the new environment until these are arrested in distant organs. Then cancer cells start the metastatic colonization. Figure taken from (Valastyan and Weinberg 2011)
1.2.6 Pancreatic cancer therapy
At present the only curative therapy for PDAC is surgical resection of the tumour. Surgery alone results in a recurrence rate of more than 90% of the patients, therefore surgery is generally combined with (neo-)adjuvant chemotherapy (Neoptolemos et al. 2018). The first adjuvant chemotherapy used was 5-Fluorouracil (5-FU), but was later replaced by
gemcitabine which showed an improved survival. Currently two distinct regimens are used, nab-paclitaxel plus gemcitabine and FOLFIRINOX (a combination of chemotherapy agents including oxaliplatin, irinotecan, leucovorin and fluorouracil).
The therapy for PDAC leaves substantial room for improvement, since less than 20% of the patients present with resectable disease and the benefits of adjuvant chemotherapy in patients with resectable disease are marginal. Substantial effort has been put into the development of targeted therapies, but with limited success until now (Garrido-Laguna and Hidalgo 2015).
With a total of 14 completed clinical trials investigating drugs to diverse molecular targets, the only targeted therapy showing benefit and currently approved for PDAC treatment is the EGFR tyrosine kinase inhibitor erlotinib. Erlotinib showed a 12-day improvement in the median survival when combined with gemcitabine (Moore et al. 2007). It is currently approved for use in patients with metastatic pancreatic cancer, although its clinical use is still limited due to the marginal clinical benefit.
1.3 Computational Biology in cancer
1.3.1 Role in cancer research
Cancer is a complex disease driven by multiple molecular aspects. Large international cancer consortium such as The Cancer Genome Atlas (TCGA) (Cancer Genome Atlas Research Network et al. 2013), the International Cancer Genome Consortium (ICGC) and other multi-center coalitions, have profiled thousands of tumours to characterize these molecular aspects and understand the interplay across them and with the clinical outcome of the patients. Computational Biology has been playing a pivotal role in all these cancer genome studies. From data generation, data processing, data storage, pattern recognition, application of clustering techniques, in-silico drug prescription to the development of novel algorithms and integrative data analysis to detangle the complex biological process underlying cancer. These include the impact of mutations in cancer driver genes (M. H.
Bailey et al. 2018), the predisposition of pathogenic germline variants in the patient population (Huang et al. 2018), the relation of tumour subtypes with the immune system (Thorsson et al. 2018), the landscape of oncogenic signaling pathways (Sanchez-Vega et al.
2018), recurrently altered genes in primary tumours and metastasis (Kandoth et al. 2013;
Zehir et al. 2017), among others aspects. Computational Biology has integrated several
disciplines such as Statistics, Mathematics, Chemistry and Molecular Biology, together with data sources to make this advance possible (Vazquez, de la Torre, and Valencia 2012).
Computational Biology has been fundamental to extract and interpret all these insights.
1.3.2 Tumour molecular profiling
High-throughput technologies such as DNA microarrays (Brown and Botstein 1999), Next-Generation Sequencing (Schuster 2008), and mass spectrometry (Feng et al. 2008), have allowed the fully characterization of tumours at the genomics and postgenomics levels.
These include molecular aspects such as the genome, transcriptome, epigenome and proteome of individual tumours. Single observations do not provide enough resolution, but large collections of tumour profiles have been useful to understand the complexity of cancer (Cancer Genome Atlas Research Network et al. 2013).
Whole-Genome (WGS) and Whole-Exome sequencing (WES) have boosted the interpretation of genetic diseases at unprecedented scale and low cost (Schuster 2008). This technology represents the most informative source of biological data to decipher the genetics basis of cancer (Sondka et al. 2018). Additionally, cancer transcriptome profiling using RNA sequencing (RNAseq) have been the preferable molecular layer to describe the cellular states and functional phenotypes of tumours (Cieślik and Chinnaiyan 2018).
Single-cell sequencing technology represents the latest advance for tumour profiling.
Single-cell analysis has been established in recent years to characterize intra-tumour heterogeneity and the tumour ecosystem (Ren, Kang, and Zhang 2018). Single-cell DNA and RNA-sequencing, epigenomic or even simultaneous profiling of these are now possible.
This technology has rapidly progressed from less than one hundred to thousands of cells derived from a single sample in a few years, where multiple platforms are available in a trade-off between gene coverage and number of single cells (Svensson, Vento-Tormo, and Teichmann 2018; Mereu et al. 2019). Single-cell analysis presents with new computational challenges. The presence of new sources of variability across observations, including an admixed of technical variability (drop-out events) and biological variability (heterogeneous cell populations and divergent cellular states across them), have promoted the development of specialized computational methods in single-cell analysis for a better performance and interpretation (Kolodziejczyk et al. 2015; Rostom et al. 2017).
1.3.3 Methodologies for molecular data analysis
High-throughput technologies for tumour profiling yields a large amount of data that must be processed, analyzed and interpreted using bioinformatics methods (Ding et al. 2014). This has promoted bioinformatics research for the interdisciplinary development of methods that are able to perform more accurate and efficient. And also, the implementation in the clinical settings of gold standard methods with high accuracy and capacity of reproduce results in patient samples. These methods include format conversion, data normalization, testing for statistical association and the application of complex algorithm for feature extraction, feature annotation, etc. For instance, bayesian algorithms for genetic variant calling has been established as standard methods to detect germ-line variants in healthy individuals and somatic mutations in cancer genomes, such as the Genome Analysis Tool Kit (GATK) and MUTECT (DePristo et al. 2011; Cibulskis et al. 2013), respectively. These have been adopted by large international consortium such as the 1000 Genomes project and the TCGA. In the transcriptome settings, the linear models proposed in limma, which incorporates the empirical Bayes moderation of the standard errors to a common value is widely used for genome-wide differential gene expression analysis (Ritchie et al. 2015).
These standard analyses yields results out of the order of hundreds to thousands of individual genes that are mutated or dysregulated in the patient sample, which is difficult to biologically interpret given its large dimension. A knowledge-driven approach to facilitate this task is to boil down the results to a set of significant overrepresented Gene Ontology (GO) terms and regulatory motifs (Al-Shahrour et al. 2004), statistically enriched signaling pathways in the observed phenotype (Subramanian et al. 2005), estimation of pathway activities using pathway footprints (Schubert et al. 2018), master regulator activities (Alvarez et al. 2016), or modules of interacting genes within biological networks (Creixell et al. 2015).
Another complementary approach to deal with the high dimensionality of the data is to perform an unsupervised feature selection and pattern recognition across large number of samples. Highly variable genes might be selected before downstream analysis. High dimensionality reduction methods could be also applied to shrink the dimensionality while the main latent factors of biological heterogeneity are kept, which may encode the unknown complex biological process (Stein-O’Brien et al. 2018). Unsupervised hierarchical and partitioning clustering methods could be used to classify samples into molecular classes (so-called tumour subtypes) (Golub et al. 1999).
1.3.4 Gene expression signatures and signature matching
Gene expression signatures are defined as a single combined group of genes that define a biological condition (e.g. a disease, a cellular state, or the transcriptome response to a cellular perturbation) or predictive outcome (e.g. patient prognosis, treatment response, etc).
Gene expression signatures can be extracted from differential gene expression analysis (e.g.
a simple linear regression model between disease versus normal condition), high dimensionality reduction methods (latent factors across all samples) and clustering techniques (centroids to classes). The cardinal meaning of these signatures is that they provide an enriched representation of the biology with a small set of genes. This property makes its application generalistic for other similar contexts that follows the same dimensional space, thus the expression of the defined signature could be matched in external sets of samples.
Signature matching is the procedure of calculating the expression of a signature in another set of samples. A common practice is to calculate the enrichment of a signature over the ranking of differentially expressed genes in a given phenotype (Subramanian et al. 2005), or along the ranking of gene expression within a single sample (Barbie et al. 2009). But other approaches have been proposed such as the minimum distance to the centroid or the matrix multiplication of the linear coefficients to the gene-level expression (Golub et al. 1999;
Schubert et al. 2018). Signature matching has been useful for drug response prediction, tumour subtype classification, inference of pathway activities and drug repositioning.
1.4 Source for large collections of tumour profiles
1.4.1 International cancer consortiums
The Cancer Genome Atlas (TCGA) is a consortium based on the United States that aims to analyze the molecular profile of primary tumours and metastatic tumours at the different molecular layers (genome, transcriptome, epigenome, etc) (Cancer Genome Atlas Research Network et al. 2013). First release included up to ~3400 tumours that belonged to 12 tumour types in 2013, which has been extended to 33 tumour types in the latest releases (2017).
The Genomics Data Common powered by the NCI is the official data portal to download the TCGA data (Grossman et al. 2016).