• No se han encontrado resultados

Functional analysis of the non-coding mouse genome through bioinformatic and CRISPR tools

N/A
N/A
Protected

Academic year: 2023

Share "Functional analysis of the non-coding mouse genome through bioinformatic and CRISPR tools"

Copied!
233
0
0

Texto completo

(1)

Facultad de Ciencias

Programa de Doctorado en Biociencias Moleculares

Functional analysis of the

non-coding mouse genome through bioinformatic and CRISPR tools

Santiago Josa De Ramos

Madrid, 2019

(2)
(3)

Facultad de Ciencias

Departamento de Biología Molecular

Tesis Doctoral

Functional analysis of the

non-coding mouse genome through bioinformatic and CRISPR tools

Santiago Josa De Ramos

Licenciado en Biología (Universidad de Navarra, 2013) Licenciado en Bioquímica (Universidad de Navarra, 2013)

Director de Tesis : Lluís Montoliu José

Laboratorio de “Modelos Animales por Manipulación Genética”

Departamento de Biología Molecular y Celular Centro Nacional de Biotecnología (CNB)

Consejo Superior de Investigaciones Científicas (CSIC)

Madrid, 2019

(4)
(5)
(6)

Esta tesis doctoral ha podido realizarse con éxito gracias a la finaciación obtenida por el laboratorio del Dr. Lluís Montoliu José a través de los siguientes proyectos:

• Validación funcional y estructural de aisladores genómicos. Ministerio de Economía y Competitividad (MINECO), Plan Nacional R+D+I, Programa de Biotecnología, 2013-2015, referencia BIO2012-39980. Investigador Principal: Dr. Lluís Montoliu José.

• Nuevos modelos animales para investigar el albinismo. Ministerio de Economía y Competitividad (MINECO), Plan Nacional R+D+I, Programa de Biotecnología, 2016-2019, referencia BIO2015-70978R. Investigador Principal: Dr. Lluís Montoliu José.

y las siguientes ayudas obtenidas por el estudiante predoctoral:

• Ayudas para contratos predoctorales para la formación de doctores FPI 2013, Ministerio de Economía y Competitividad (MINECO), 2014-2017, referencia: BES- 2013-064805. Beneficiario: Santiago Josa De Ramos.

• Ayudas a la movilidad predoctoral para la realización de estancias breves en centros de I+D 2015. Estancia como investigador visitante en el laboratorio del Dr. Bing Ren en el Ludwig Institute for Cancer Research (LICR), Universidad de California - San Diego, La Jolla, San Diego, EE.UU. Ministerio de Economía y Competitividad (MINECO). Septiembre - noviembre de 2016. Beneficiario: Santiago Josa De Ramos.

• Contrato CSIC, 2018. Financiado a través de un proyecto intramural del CSIC.

y ha resultado en dos artículos en revistas de investigación a día de la publicación del presente documento:

• Fernández, A., Josa, S. and Montoliu, L. (2017) ‘A history of genome editing in mammals’, Mammalian Genome, 28(7-8), pp. 237-246. doi: 10.1007/s00335-017- 9699-2.

• Josa, S., Seruggia, D., Fernández, A. and Montoliu, L. (2017) ‘Concepts and tools for gene editing’, Reproduction, Fertility and Development, 29(1), pp. 1-7. doi:

10.1071/RD16396.

(7)
(8)

vii

Agradecimientos

Agradecimientos

Tras tantos años dedicados a esta tesis, son muchas las personas a las que agradecerles su ayuda, su apoyo y su compañía. En primer lugar me gustaría agradecerle a Lluis Montoliu que me haya dado la posibilidad de trabajar en este laboratorio. Gracias a él, he aprendido a desarrollar mis proyectos y he adquirido habilidades que me ayudarán de aquí en adelante.

Me gustaría dar las gracias a toda la gente del laboratorio 111, tanto pasados como presentes, en especial a Andrea y Marta por su ayuda durante los últimos meses para poder realizar mis últimos experimentos. Por último, quiero darle las gracias a Cris, por su apoyo y ayuda durante todos estos años a mi lado.

(9)
(10)

ix

Summary

Summary

Elucidating the genetic basis of human diseases is required to understand their mechanisms and unravel new therapies. Mouse models are used to mimic human genetic alterations, present in both coding and non-coding DNA. Most of the human and mouse genome is composed by non-coding DNA sequences, full of regulatory elements, which comprises enhancers, silencers and insulators, among others. The correct behaviour of these elements allows correct gene performance and absence of disease. The purpose of this PhD work is to uncover the role of two non-coding regulatory elements found in mouse genome.

These are genomic insulators and a potential retinal pigment epithelium specific Tyr enhancer.

Genomic insulators confine genes and their regulatory elements in the same expression domain. Unlike enhancers or silencers, insulators cannot be defined by sequence but through their activity. Two bioinformatic algorithms were previously developed in our laboratory to predict the existence of insulators based on adjacent divergent gene expression patterns.

We have performed an additional functional and structural analysis of these algorithms that predict genomic insulators. HiC experiment data of Topological Associated Domain boundaries have been used to improve bioinformatic algorithms output. Moreover, the combination of both strategies has been used to detect and test new potential DNA sequence with powerful in vitro enhancer blocking activity.

The mouse Tyr locus contains all the elements needed for its correct expression in melanocytes. However, the control of Tyr expression in retinal pigment epithelium cells remains still unclear. We have evaluated the functional role of a distal regulatory element CNS2 as potential enhancer. CRISPR/Cas9 edited mice were obtained for this purpose.

CNS2 element seems to be driving Tyr expression in retinal pigment epithelium but it is not essential.

The analysis performed in this PhD work shows the relevance of studying non-coding elements. CRISPR/Cas9 system have proven its reliability to interrogate non-coding genomic elements.

(11)
(12)

xi

Resumen

Resumen

Elucidar las bases genéticas de las enfermedades humanas es un requisito para entender sus mecanismos y desarrollar nuevas terapias. Los modelos de ratón son usados para imitar alteraciones genéticas humanas, presentes tanto en regiones codificantes como no codificantes del ADN. La mayor parte del genoma humano y de ratón está compuesto de secuencias de ADN no codificantes, repletas de elementos reguladores, que consta de potenciadores, silenciadores y aisladores, entre otros. El adecuado comportamiento de estos elementos permite el funcionamiento correcto de los genes y la ausencia de enfermedad. El propósito de esta Tesis Doctoral es estudiar el papel de dos elementos reguladores no codificantes encontrados en el genoma de ratón. Estos son aisladores genómicos y un posible potenciador de Tyr específico del epitelio pigmentado de la retina.

Los aisladores genómicos confinan los genes y sus elementos reguladores en el mismo dominio de expresión. A diferencia de los potenciadores y silenciadores, los aisladores no pueden ser definidos por su secuencia sino mediante su actividad. Dos algoritmos bioinformáticos han sido desarrollados previamente en el laboratorio para predecir la existencia de aisladores basándose en los patrones de expresión divergentes de los genes adyacentes. Hemos realizado un análisis funcional y estructural adicional de estos algoritmos que predicen aisladores genómicos. Los datos de experimentos de HiC de barreras de Dominios Asociados a Topología han sido usados para mejorar los resultados de los algoritmos bioinformáticos. Además, la combinación de ambas estrategias ha sido usada para detectar y evaluar nuevas potenciales regiones de ADN con potente actividad de bloqueo de potenciador en estudios in vitro.

El locus de Tyr contiene todos los elementos necesarios para su correcta expresión en melanocitos. Sin embargo, el control de la expresión de Tyr en las células del epitelio pigmentado de la retina aún no está del todo definido. Hemos evaluado el papel funcional de un elemento regulador distal llamado CNS2 como un posible potenciador. Ratones editados mediante CRISPR/Cas9 fueron obtenidos para este propósito. El elemento CNS2 parece dirigir la expresión de Tyr en el epitelio pigmentado de la retina, pero no es esencial.

El análisis realizado en este trabajo doctoral muestra la relevancia de estudiar los elementos no codificantes. El sistema CRISPR/Cas9 ha demostrado su fiabilidad para analizar elementos no codificantes del genoma.

(13)
(14)

xiii

Table of Contents

AGRADECIMIENTOS ...vii

SUMMARY...ix

RESUMEN ...xi

TABLE OF CONTENTS ...xiii

LIST OF FIGURES ...xvii

LIST OF TABLES ...xxi

ABBREVIATIONS ...xxiii

INTRODUCTION ...1

1. ENCODE Project and regulatory elements ...3

2. Genomic insulator elements and its relevance in genomic 3D conformation ...5

2.1. General description and functions ...5

2.2. Mechanisms of insulation ...7

2.3. Topological Associated Domains and boundaries as a mechanism of insulation ...8

2.4. Diseases by insulator disruption ...9

2.5. Algorithms developed by our laboratory for insulator discovery ...10

3. CRISPR as an outstanding genome editing tool for functional non-coding genomics ...13

4. Study of Tyr regulation by gene editing ...15

4.1. Albinism ...15

4.2. Tyr regulatory elements ...17

4.2.1. Proximal regulatory elements of mouse Tyr ...17

4.2.2. Distal Tyr regulation study by transgenic mice ...19

4.3. CRISPR-edited mice for Tyr regulation study ...21

OBJECTIVES ...23

MATERIAL AND METHODS ...27

1. Bioinformatic and online resources ...29

1.1. HiC and PLAC-seq datasets ...29

1.2. Pearl and MATLAB scripts to process TAD boundary data ...31

1.3. Pearson Correlation and Euclidean Distance algorithms for insulator prediction ..32

1.4. Comparison between gene pairs with conserved TAD boundaries from the genome and from Pearson Correlation and Euclidean Distance algorithms ...33

2. Oligonucleotides and Primers ...34

Table of Contents

(15)

xiv

Table of Contents

3. PCR ...35

4. Cloning Procedures ...36

4.1. Plasmids for insulator analysis and genotyping ...36

4.2. Plasmids for CRISPR/Cas9 experiments ...38

4.3. Plasmid digestion ...39

4.4. Golden Gate cloning ...39

4.5. Gibson Assembly Cloning ...41

4.6. Transformation and bacterial growing ...42

5. Enhancer-Blocking Assay ...43

5.1. Plasmid preparation ...43

5.2. Cell culture and transfection ...43

5.3. Preparation of cellular extracts ...44

5.4. Luciferase activity ...44

5.5. β-Galactosidase activity ...45

5.6. Data analysis ...45

6. CRISPR/Cas9 mouse edition ...46

6.1. CRISPR/Cas9 design ...46

6.2. Cell culture conditions ...46

6.3. Cell culture transfection ...46

6.4. Cas9 and sgRNA in vitro transcription ...47

6.5. Microinjection ...49

7. Mouse handling ...49

8. CRISPR-edited mouse genotyping ...50

8.1. Genotyping strategy ...50

8.2. T7 endonuclease I assay ...51

9. Mouse phenotyping ...53

9.1. Mouse plucking ...53

9.2. Histological analysis ...53

9.3. Melanin content analysis ...54

9.4. Gene expression analysis by RT-qPCR ...54

9.5. Statistical analysis of the data ...55

RESULTS AND DISCUSSION ...57

Functional and Structural Analysis of Genomic Insulators ...59

1. Processing of gene pairs with potential boundaries ...60

2. Study of the presence of TADs in gene pairs described by own algorithms ...61

2.1. TAD boundary datasets from HiC experiments ...61

2.2. Conservation of TAD boundaries between tissues ...64

(16)

xv

Table of Contents

2.3. Enrichment of algorithms for detecting insulators with TAD boundaries ...66

3. Evaluation of the Enhancer-blocking activity of potential insulators ...69

3.1. Selection of candidates for Enhancer Blocking study ...69

3.1.1. Functional Annotation of the Genes selected for testing insulator activity ..72

3.2. Analysis of the blocking activity of selected elements ...82

3.2.1. Analysis of insulator size in Enhancer Blocking Activity ...91

3.2.2. Analysis of orientation in Enhancer Blocking Activity ...92

4. Relevance of insulators for shielding gene expression ...93

Analysis of CNS2 Regulatory Element at the Tyr Locus ...98

5. CRISPR sgRNA design and validation ...99

6. Mouse CRISPR/Cas9 deletion by microinjection ...101

7. Genetic characterization of founder mice ...102

7.1. Analysis of each Double Strand Break event ...102

7.2. Detection of deletion of interest ...104

7.3. Analysis of the presence of inversions ...106

7.4. Analysis of off-target sequences ...109

7.5. CRISPR/Cas9 deletion performance ... 110

8. Functional characterization of homozygous mutant mice... 111

8.1. Selection of four lines of mice of interest ... 111

8.2. Gene expression analysis ... 114

8.3. Visual and histological analysis ... 117

8.4. Melanin content analysis ... 119

9. CNS2 drives Tyr expression but is not essential ...121

GENERAL DISCUSSION ...125

1. Are TADs defining gene expression domains? ...128

2. CRISPR is revolutionizing gene editing perspectives ...131

CONCLUSIONS - CONCLUSIONES ...133

BIBLIOGRAPHY ...139

APPENDICES ...159

Appendix 1. MATLAB code for TAD boundary dataset processing ...161

Appendix 2. Primers used in this work ...182

Appendix 3. Detailed view of potential insulator regions ...186

Appendix 4. Sanger sequencing of CNS2 founder mice ...195

Appendix 5. Off-target sequences for CRISPR sgRNAd ...197

(17)

xvi

Table of Contents

Appendix 6. Off-target sequences for CRISPR sgRNAe ...199

Appendix 7. Off-target sequence analysis ...201

Appendix 8. Articles ...203

8.1. "A history of genome editing in mammals" ...203

8.2. "Conceps and tools for gene editing" ...213

(18)

xvii

List of figures

Figure I.1 DNA elements detected by ENCODE project. ...4

Figure I.2 Existing enhancer-promoter interaction models ...5

Figure I.3 Insulator functions ...6

Figure I.4 Proposed models for insulating and enhancer activity ...7

Figure I.5 Gene expression values are used to calculate divergent expression patterns... 11

Figure I.7 CRISPR/Cas9 genome editing technology ...14

Figure I.8 Tyr regulatory landscape ...18

Figure I.9 Transgenic constructs used to study Tyr LCR ...20

Figure I.10 Inactivation of Locus Control Region by genome editing and transgenesis ...22

Figure M.1 HiC experimental procedure ...30

Figure M.2 Map of CMVELuc plasmid ...37

Figure M.3 Enhancer Blocking Assay control elements ...37

Figure M.4 Map of hCas9 plasmid ...38

Figure M.5 Map of MLM3636 plasmid ...38

Figure M.6 Golden Gate cloning ...40

Figure M.7 Gibson Assembly cloning ...42

Figure M.8 In vitro T7 transcription ...47

Figure M.9 Genotyping strategy ...51

Figure M.10 T7 Endonuclease I assay ...52

Figure M.11 Plucking timeline ...53

Figure M.12 Mice plucking ...53

Figure R.1 Representative example of TAD domains and boundaries used for the analysis ...62

Figure R.2 Codification of genomic regions with and without TAD boundaries ...65 Figure R.3 Analysis of presence of TAD boundaries in Pearson Correlation and

List of figures

(19)

xviii

List of figures

Euclidean Distance algorithms ...68

Figure R.4 Comparison of Score distribution of pairs coincident with TAD boundaries ...69

Figure R.5 Filtering of gene pairs from Pearson Correlation and Euclidean Distance algorithms to evaluate new potential insulator elements ...70

Figure R.6 Scheme of the constructs used in the in vitro enhancer-blocking assay ... ...82

Figure R.7 Control elements of the in vitro Enhancer Blocking Assay ...84

Figure R.8 In vitro enhancer-blocking assay in HEK 293 cells of selected regions of interest ...87

Figure R.9 Effect of DNA element size in their enhancer blocking activity ...92

Figure R.10 Relevance of element orientation in enhancer blocking activity. ...93

Figure R.11 Tyr locus landscape ...99

Figure R.12 sgRNA validation in N2a cell culture. ...101

Figure R.13 T7 Endonuclease I assay for of CNS2 CRISPR-deleted founder mice .103 Figure R.14 Deletion screening of CNS2 CRISPR-deleted founder mice. ...105

Figure R.15 CNS2 deletions found in founder mice. ...106

Figure R.16 Schema of inversion of CNS element...107

Figure R.17 Founder mice checked for the presence of a CNS2 element inversion. 108 Figure R.18 Genotype of selected mouse lines for phenotypic analysis ... 111

Figure R.19 Scheme of the three stages of hair cycle. ... 112

Figure R.20 Comparison between unplucked and plucked mouse skin analysis of Tyr .. ... 114

Figure R.21 Gene expression analysis of CNS2 versus wild type mice ... 116

Figure R.22 CNS2 selected homozygous mouse lines ... 117

Figure R.23 Histological analysis of CNS2 mouse skin... 118

Figure R.24 Histological analysis of CNS2 mouse eye. ... 119

Figure R.25 Melanin content analysis of CNS2 versus wild type mice. ...120

Figure D.1 ENCODE high-throughput techniques used for DNA analysis. ...128

Figure D.2 Topological associated domains detected in Tyr locus in this project ...130

(20)

xix

List of figures

Figure D.3 Proposed 3D model of Tyr locus ...131 Figure A.1 Genomic context of selected potential insulators ...186 Figure A.2 Sanger sequencing of positive founder mice for T7EI assay of DSB-d 195 Figure A.3 Sanger sequencing of positive founder mice for T7EI assay of DSB-e 196 Figure A.4 Sanger sequencing of positive mice for deletion screening by PCR ...196 Figure A.5 Off-target sequence analysis for sgRNAd ... 201 Figure A.6 Off-target sequence analysis for sgRNAe...202

(21)
(22)

xxi

List of Tables

List of Tables

Table I.1 List of genes associated to albinism and number of mutations described or each one ...16 Table M.1 Primer tails for cloning procedures ...34 Table M.2 PCR conditions ...36 Table M.3 Golden Gate cloning primers ...39 Table M.4 PCR for in vitro transcription template preparation ...48 Table R.1 TAD domains and boundaries obtained from HiC experiments ...63 Table R.2 Rare disease related gene pair candidates for insulator discovery ...73 Table R.3 Albinism related gene pair candidates for insulator discovery ...77 Table R.4 Cancer related gene pair candidates for insulator discovery ...79 Table R.5 Rare disease-related candidates analysed for Enhancer-BlockingActivity .

...88 Table R.6 Albinism-related candidates analysed for Enhancer-Blocking Activity ..89 Table R.7 Cancer-related candidates analysed for Enhancer-Blocking Activity ...90 Table R.8 DNA target sequence for CRISPR edited mouse and their genomic

location ...100 Table R.9 CRISPR/Cas9 microinjection into B6CBAF2 fertilized oocytes ...102 Table R.10 Off-target sequences selected for PCR and T7EI screening ...109 Table R.11 Summary of mutations found among 81 CNS2 founder mice ... 110 Table A.1 Primers used to amplify, clone and analyse potential insulatorelements ....

...182 Table A.2 Primers used for CNS enhancer element analysis ...184 Table A.3 Primers used for general purposes ...185 Table A.4 Off-target sequences for CRISPR sgRNAd ...197 Table A.5 Off-target sequences for CRISPR sgRNAe ...199

(23)
(24)

xxiii

Abbreviations

Abbreviations

3C: Chromosome Conformation Capture 3D: Three-Dimensional

4C: Chromosome Conformation Capture-on-Chip 5C: Chromosome Conformation Capture Carbon Copy aGEM: anatomical Gene-Expression Mapping

alfa-MSH: α-Melanocyte-Stimulating Hormone AP-1: Activator protein 1

bHLH-ZIP: Helix-Loop-Helix leucine Zipper bp: base pairs

cAMP: cyclic Adenosine Monophosphate Cas: CRISPR associated

cDNA: complementary DNA

ChIA-PET: Chromatin Interaction Analysis by Paired-End Tag Sequencing ChIP-seq: Chromatin Immunoprecipitation sequencing

CIBER-ER: Centro de Investigación Biomédica en Red de Enfermedades Raras CMV: Cytomegalovirus

CMVE: Cytomegalovirus Enhancer

CMVmP: Cytomegalovirus minimal Promoter CNB: National Centre for Biotechnology CNS: Conserved Non-coding Sequence CorrCoef: Pearson Correlation Coefficient CorrList: Correlation List

CREB: cAMP Response Element-Binding

CRISPR: Clustered Regulatory Interspaced Short Palindromic Repeats crRNA: CRISPR RNA

CSIC: Consejo Superior de Investigaciones Científicas CTCF: CCCTC-binding factor

(25)

xxiv

Abbreviations

CTCF Plac-seq: CTCF Proximity ligation assisted ChIP-seq DistList: Euclidean Distance List

DistValue: Euclidean Distance Value

DMEM: Dulbecco’s Modified Eagle’s Medium DMSO: Dimethyl Sulfoxide

DHS: DNase I Hypersensitive Site DNA: Deoxyribonucleic Acid

DNase-seq: DNase I Hypersensitive Sites sequencing dNTPs: deoxynucleotides triphosphates

DSB: Double Strand Break EBA: Enhancer Blocking Assay

EDTA: Ethylenediaminetetraacetic Acid ENCODE: Encyclopedia of DNA Elements ES cells: Embryonic Stem Cells

EtBr: Ethidium Bromide EtOH: Ethanol

F0: Founder transgenic mice

F1: First filial generation of transgenic mice

FAIRE-seq: Formaldehyde-Assisted Isolation of Regulatory Elements FBS: Fetal Bovine Serum

Fw: Forward

EGFP: Enhanced Green Fluorescent Protein Grm5: Metabotropic glutamate receptor 5 hCas9: humanized Cas9 nuclease

HDR: Homology-Directed Repair HEK: Human Embryonic Kidney cells HEPA: High-Efficiency-Particulate-Air

HEPES: 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid hESC: human Embryonic Stem Cells

HiC: High throughput 3C technique HR: Homologous Recombination

(26)

xxv

Abbreviations

HS: Hypersensitive Site (to cleavage by DNase I) INDELs: Insertions and Deletions

Inr: Initiator region kDa: kilodalton kb: kilobases

LB: Lysogeny Broth

LCR: Locus Control Region

L-DOPA: Levodopa (L-3,4 dihidroxifenilalanina) LINE: Long Interspersed Nuclear Elements Mb: Megabases

McSC: Melanocyte Stem Cells mESC: mouse Embryonic Stem Cells MGI: Mouse Genome Informatics MIR: Mammalian Interspersed Repeats MIT: Massachusetts Institute of Technology

MITF: Microphthalmia-Associated Transcription Factor mNPC: mouse Neural Progenitor Cells

mRNA: messenger RNA N2a: Neuro-2a cell

NCBI: National Center for Biotechnology Information NHEJ: Non-Homologous End Joining

Nox4: NADPH oxidase 4 nt: nucleotides

OA: Ocular Albinism

OCA: Oculocutaneous Albinism

OCA1: Oculocutaneous Albinism Type 1 OMIM: Online Mendelian Inheritance in Man ONPG: Ortho-NitroPhenyl-β-Galactoside OTX2: Orthodenticle Homeobox 2 PAM: protospacer adjacent motif PBS: Phosphate-Buffered Saline

(27)

xxvi

Abbreviations

PCR: Polymerase Chain Reaction PFA: Parafolmaldehyde

PWA: People With Albinism RNA: Ribonucleic Acid

RNApolII: RNA polymerase II RNApolIII: RNA polymerase III RNA-seq: RNA sequencing RPE: Retinal Pigment Epithelium

RT-qPCR: Quantitative Reverse Transcription PCR RT: Room Temperature

Rv: Reverse

SD: Standard Deviation

SDS: Sodium Dodecyl Sulphate SEM: Standard Error of the Mean sgRNA: synthetic guide RNA

SINE: Short Interspersed Nuclear Element SNPs: Short Nucleotide Polymorphism T7EI: T7 Endonuclease I

Ta: Annealing temperature

TAD: Topological Associated Domain

TALEN: Transcription Activator-Like Effector Nucleases TBE: Tris/Borate/EDTA

Te: Elongation Time TE: Tris-EDTA

TE-1: Tyrosinase Element-1 Tm: Melting temperature

tracrRNA: trans-activating crRNA TSS: Transcription Start Site

Tyr: Tyrosinase

Tyre-m: Chinchilla mottled mice

(28)

xxvii

Abbreviations

UCSC: University of California - Santa Cruz UTR: Untranslated Region

UV: Ultraviolet light

UVB: Ultraviolet light type B Wt: wild-type

YAC: Yeast Artificial Chromosomes ZFN: Zinc-Finger Nucleases

(29)
(30)

Introduction

(31)
(32)

3 ENCODE Project and regulatory elements

Introduction

1. ENCODE Project and regulatory elements

Elucidating the genetic basis of human disease has been primordial for the medical science of 20th century. For that reason, sequencing of human genome has been an important milestone that allowed scientists to correlate some of the most worrying disease to its genetic cause. Moreover, sequencing of mouse genome has provided an invaluable tool for comparative studies of human diseases and for performing mouse models that mimic those genetic alterations to study its phenotype (Hardouin and Nagy, 2000; Gunn and Canine, 2014). The human genome is composed of around 3,100,000,000 bp and 20,500 coding genes distributed in 23 pairs of chromosomes, whose correct performance allows a proper well-being and a healthy lifespan (Collins et al., 2004). The mouse genome contains around 2,700,000,000 bp, and 22,500 coding genes, distributed in 20 pairs of chromosomes (Waterston et al., 2002).

However, those approximate 20,000 genes occupy only about 2% of the whole genome (Elgar and Vavouri, 2008). Furthermore, differences in coding regions cannot explain differences among organisms along evolutionary history, since organisms as different as human and mouse share 80% of coding DNA. The remaining 98% of the genome was first considered as junk DNA, until it was shown that contains important elements that drive spatial and temporal expression of coding sequences (Yue et al., 2014). Along these years, a research topic of capital interest has been to associate each genetic-based disease with mutations in the gene or genes responsible for the pathology. A better understanding of the underlying molecular mechanisms would promote the discovery of new therapies (Matsui and Corey, 2017). Understanding why organisms with a similar number of genes and a high homology have a dissimilar appearance should be of major relevance.

For all these purposes, ENCODE (ENCyclopaedia Of Dna Elements) project has been developed to study all elements (Figure I.1), coding and non-coding, necessary for an organism to develop (ENCODE Project Consortium, 2012; Yue et al., 2014). Experiment profiles of ChIP-seq, RNA-seq and DNase-seq have provided an insight of where, when and which elements are needed. Moreover, tissue-specific elements have been detailed, which can explain developmental and evolutive differences. Development of human and mouse ENCODE projects have discovered several elements which can modulate cellular identity.

Among these, proximal (promoters) and distal (enhancers, silencers, insulators) regulatory elements have been described. The presence of precise DNA methylation and histone marks

(33)

4

ENCODE Project and regulatory elements

Introduction

has also been used to help the discovery of regulatory elements. Genome has been typically envisaged as a linear thread of successive nucleotides which bears coded information, but the truth is that interphase chromatin is organized in a 3D entanglement of genetic material and proteins arranged in loops that allow the emergence of functional elements (Lieberman- Aiden et al., 2009).

A promoter is an element that allows RNA polymerase to begin at the transcription start site (TSS). Some of them present a consensus binding motif, a TATA box, but most of them are GC and CpG enriched regions without a TATA box and a large region within 100 bp from TSS. Enhancers and silencers are a class of cis-regulatory element that play a major role in cell-type gene expression, promoting or reducing it. Promoters can drive gene expression by its own, but enhancers and silencers are the responsible of determining if a gene is (enhancers) or is not (silencers) expressed in each tissue and developmental stage.

Enhancers recruit proteins as RNApolII that promotes gene expression and can regulate genes by several models (Figure I.2). In the tracking model, a protein loads on the enhancer and goes over the DNA until reaching the promoter, where enhances its activity. The linking model suggests that proteins are loaded until they reach the promoter. Relocation model proposes that an enhancer relocates to another region in the nucleus where transcription is favoured. Finally, the looping model proposes a direct contact between enhancer and promoter, forming a DNA loop that allow the contact between enhancers, even if they are Mb far away. This last model seems to be the most accepted model, endorsed by results obtained through 3C derived techniques (Simonis et al., 2007).

Epigenetic regulation has been defined as an additional layer of regulation on top of gene sequence. Chromatin can be subdivided in two different states: active and accessible euchromatin and repressed heterochromatin. Chromatin marks, as methylation or acetylation,

Long-range regulatory elements (enhancers, repressors/

silencers, insulators)

cis-regulatory elements (promoters, transcription

factor binding sites)

Transcript Gene

Figure I.1 DNA elements detected by ENCODE project. DNA features detected by ENCODE techniques are translated into DNA functional elements. Each technique allows the detection, analysis and annotation of different DNA elements. Figure adapted from Pennisi, 2012.

(34)

5 Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

dictate its corresponding genomic state. Enhancers and silencers can recruit protein complexes that remodel these epigenetic marks and, consequently, activate or repress its target genes (Kolovos et al., 2012).

Last, enhancers and silencers can be located megabases away from their target genes, so a correct genomic organization is needed to interconnect regulatory elements to their targets (Matharu and Ahanger, 2015). This function is taken up by insulator elements, which divide genome into different expression domains, containing all genes and regulatory elements needed for the proper functioning of the domain.

2. Genomic insulator elements and its relevance in genomic 3D conformation

2.1. General description and functions

Throughout the genome, genes with different expression patterns are found one next to the other. It is normal to find out that one gene is expressed in a completely different tissue that the adjacent gene. Expression domains are the genomic region with all the genes

B

A Tracking Linking

C Relocation D Looping

Figure I.2 Existing enhancer-promoter interaction models. Enhancer (purple hexagon) interacts with promoter to recruit polymerase (pink oval) and drive gene expression (blue strip). (A) The tracking model, where enhancer recruits proteins that tracks along DNA fibre towards the promoter and stimulates transcription.

(B) The linking model, where a cascade of transcription factors is loaded into the enhancer until they reach the promoter and recruits the polymerase. (C) The relocation model, where enhancer provokes changes in chromatin fibre and promoter moves to a more active region. (D) The looping model, where DNA fibre forms a loop that allows enhancer to directly contact promoter and recruit the polymerase. Figure adapted from Kolovos et al., 2012.

(35)

6

Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

and regulatory elements needed for their correct expression, flanked by insulator regulatory elements.

Insulators are a heterogeneous group of regulatory elements that cannot usually be defined by their DNA sequence (Bell et al., 2001; West et al., 2002), but through their activity. A genomic insulator has at least two functions (Figure I.3), enhancer-blocking activity and barrier activity (Burgess-Beusse et al., 2002; Ghirlando et al., 2012). Enhancer blocking activity impairs the interaction between promoters and spurious enhancers, encasing promoters and their own regulatory elements. Barrier activity of insulators avoids the spreading of silenced heterochromatin into active euchromatin. Thus, they can separate repressed and active chromatin and allow the proper functioning of active genes. Moreover, insulators can have an active role in the three-dimensional conformation of genome in the nucleus, making contacts that structurally organize the genome in 3D. These contacts approach distant regulatory elements to their target and allow its activation or silencing. Not all discovered insulators fulfil with all the functions, as some may act as an enhancer blocker, while others as a heterochromatin spreading barrier (Gaszner and Felsenfeld, 2006).

Barrier

A

Enhancer Blocking

B

Figure I.3 Insulator functions. (A) Insulators may be functioning as barriers preventing the spreading of condensed heterochromatin. (B) Insulators may be acting as an enhancer-blocking element preventing enhancer-promoter interactions. Figure reproduced from Burgess-Beusse et al., 2002.

(36)

7 Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

2.2. Mechanisms of insulation

Several models have been proposed by which genomic boundaries translate their function of insulating domains and block the effect of outer regulatory elements inside the protected locus. Although the important role of insulators as a delimiter of independent units of gene expression is clear, the specific mechanisms that are used are still largely unknown (Figure I.4). The main proposed mechansims are:

A. In the roadblock model, insulators act as a molecular wall that block the movement of regulatory enhancers and silencers towards the promoter (as described in tracking or linking enhancer model) (Lunyak et al., 2007;

Gohl et al., 2011). This can be achieved passively, by the binding of several nucleoprotein to the insulator element, or actively, by changing surrounding chromatin activation state into a different one.

Tracking Looping Topological domain

Roadblock Sink/Decoy

Independent loops

EnhancerInsulator

A B C

Figure I.4 Proposed models for insulating and enhancer activity. (A) In the tracking model, transcription factors are recruited at the enhancer (En) and then they move towards the promoter. Insulator (B) acts as a roadblock, avoiding the movement of transcription factors. (B) In the looping model, transcription factors are bound to the Enhancer thanks to a loop of the chromatin fiber. Insulator acts as a decoy that attracts or repels the enhancer, avoiding enhancer-promoter binding. (C) In the topological loop domain model, boundaries divide chromosome into topological domains. Regulatory elements contact promoters inside its domain by sliding of the fiber. There is not contact between enhancers and promoters of different domains.

Figure adapted from Gohl et al., 2011.

(37)

8

Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

B. The sink/decoy model proposes that insulators trap looping enhancers (Comet et al., 2011). Enhancers must actively bind to insulator prior to its binding to promoter, to prevent gene activation. Another version might be insulators repelling enhancers and thus warding off promoter.

C. The third proposed mechanism, the topological domains formation, consists in the formation of loops formed by the contact between boundaries, creating independent loops. Enhancers or silencers inside the loop can slide along the DNA to access the promoters located inside the same loop, but they are completely isolated from promoters outside the loop (Wallace and Felsenfeld, 2007).

2.3. Topological Associated Domains and boundaries as a mechanism of insulation

Chromosomes are not randomly distributed in the nucleus. Each chromosome is restricted to a specific region known as chromosome territories (Cremer and Cremer, 2010).

There are more intrachromosomal contact probability than interchromosomal contacts (Lieberman-Aiden et al., 2009). Moreover, gene-rich regions of chromosomes are usually grouped at the centre of the nucleus rather than in the membrane. There, genes are closer to transcription machinery and there is more chance to be expressed (Finlan et al., 2008).

Chromosome conformation capture (3C) and derived techniques have allowed to study long-range interactions between distant loci (Dekker et al., 2002). Whole genome 3C, called HiC, enables the simultaneous identification of whole genome contacts between DNA regions, showing interactions between promoters, enhancers, silencers and insulators.

First HiC maps showed that nucleus is divided in two compartments or neighbourhoods, A and B. Compartment A regions are early replicating, contain a high density of genes, while compartment B regions are late replicating and overlap with lamina-associated domains (Lieberman-Aiden et al., 2009). These compartments are cell-specific and switch from one to the other compartment during stem cell differentiation (Dixon et al., 2015).

Genome is folded into several domains in a three-dimensional architecture. Most of this spatial organization seems to be linked to gene regulation (Lieberman-Aiden et al., 2009; Rao et al., 2014; Bonev and Cavalli, 2016). These domains are called Topological Associated Domains (TAD). HiC experiments have shown higher probability of contacts

(38)

9 Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

between a DNA region inside the TAD than outside. They contact promoters and enhancers.

Boundaries separate different TADs, insulating promoters from other regulatory element (Dixon et al., 2012; Nora et al., 2012). Contact frequencies between genomic loci inside the same TAD is severalfold higher than between loci of different TAD. TADs are relatively stable across cell types and independent of tissue-specific of gene expression- Moreover, they are evolutionarily conserved (Dixon et al., 2015). TAD boundaries seem rich in CTCF binding sites and housekeeping genes.

CTCF (CCCTC-binding factor) is considered a key player for genome organization.

CTCF is a 11-zinc finger protein that is ubiquitously expressed. It was initially described as a transcription repressor of chicken c-myc (Klenova et al., 1993). Nowadays, it is considered a regulator of 3D structure of the genome (Wendt et al., 2008). CTCF acts in combination with cohesin complex. When CTCF or cohesin are depleted, the TADs involved in this boundary disappear and a higher TAD is formed (Nora et al., 2017; Schwarzer et al., 2017; Wutz et al., 2017).

TADs can be further divided into smaller sub-TADs and chromatin loops (Phillips- Cremins et al., 2013; Rao et al., 2014). Unlike TADs, sub-TADs are less conserved across tissues and appear to be related to cell type-specific expression (Phillips-Cremins et al., 2013). Chromatin loops are often found mediated by cohesin and CTCF and occur in the same TAD or sub-TAD.

Two sets of models have been proposed for TAD formation. In the handcuff model, two distant ends of a domain are bound by the interaction of CTCF proteins and recruit cohesin complex to stabilize the junction (Vietri Rudan and Hadjur, 2015). The second model, termed the extrusion model, proposes that DNA is generated dynamically when two complexes of DNA and cohesin wander along DNA in opposite directions until they reach a binding motif. In both cases, CTCF and cohesin complex play an important role in the loop formation (Vietri Rudan and Hadjur, 2015; Cuadrado et al., 2019).

2.4. Diseases by insulator disruption

The correct conformation of TAD boundaries in its precise time and tissue is of major importance to achieve a correct spatio-temporal gene expression. The disruption of these domains can derive in an altered status and a reorganization of higher-order chromatin,

(39)

10

Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

switching between A and B compartments during differentiation, silencing and activating genes when or where they are not needed (Dixon et al., 2015). This aberrant chromosomic organization is also detected in other tissues, since most of the TAD boundaries are conserved across tissues (Fraser et al., 2015). When boundaries are depleted, enhancers and silencers act outside its proper domain, affecting genes near the breakpoint. It also causes the rearrangement of TADs and the appearance of new domains. This causes ectopic expression that can be observed in limb development diseases. Rearrangements of TAD can cause congenital diseases or cancer (Lupiáñez et al., 2015; Franke et al., 2016; Weischenfeldt et al., 2017).

Over the last decades, researchers have dedicated their efforts to understand the genetic basis of cancer. They have mainly focused on protein coding gene alterations, but most of the disease-associated SNPs lie within non-coding regions (Maurano et al., 2012). Alteration in epigenetic marks, typically found in cancer stages, can affect to methylation patterns of CTCF and Cohesin binding sites, present in TAD boundaries. These hypermethylated stages cause disrupted chromosomal topologies, leading to aberrant regulatory interactions of adjacent enhancers and PDGFRA gene, leading to an oncogene activation (Taberlay et al., 2014; Flavahan et al., 2016). In addition, copy number variants can be responsible of boundary rearrangement, appearing new and smaller TADs. Overall spatial organization seems to retain the higher-order characteristics, appearing highly similar between normal and cancer cells. However, cancer-specific domains arise, and altered gene expression profiles are present (Taberlay et al., 2016; Wu et al., 2017). Heterogeneity of cancer cells makes important to obtain a defined pattern of its regulatory landscape to understand the overall tumoral process (Wu et al., 2017).

Correct identification of boundaries and its relationship with the whole regulatory elements appears to be critical in current studies of diseases with a genetic basis.

2.5. Algorithms developed by our laboratory for insulator discovery

Classical search for regulatory elements has been based on bioinformatic predictions and high throughput analysis as ChIP-seq. However, insulators cannot be described by its sequence, as most of them have been detected according to the regulatory function they display. Even their DNA sequence is known, it is not always related to insulator functions. So, an unbiased method for insulators detection has been needed that can detect them genome-

(40)

11 Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

wide because of their function and not their DNA sequence. For this purpose, our laboratory has previously developed two algorithms to solve this issue based on gene expression data (Vicente-García, 2014). These algorithms use the anatomical Gene-Expression Mapping (aGEM) Platform, a web tool developed by the CNB-CSIC Biocomputing Unit (Jiménez- Lozano et al., 2009, 2012). This platform integrates phenotypic information with spatial and temporal distributions of mouse gene expression. aGEM tool retrieved expression profiles data for all MGI identified genes, and two algorithms are created to predict potential boundaries: Pearson’s correlation coefficient and Euclidean Distance algorithms (Figure I.5).

Pearson’s correlation coefficient was calculated for the expression profiles of a given pair of consecutive pairs, using the expression value of each gene A and B for each tissue n:

Moderate Low

Strong

The structure doesn’t exist in the ontology No data available

A B

Figure I.5 Gene expression values are used to calculate divergent expression patterns. (A) Expression from different databases is recollected by aGEM software and categorize in three categories: low, moderate and strong expression. These values are used for calculating Euclidean Distance algorithm and Pearson Correlation Coefficient. (B) Pearson Correlation is calculated between a pair of genes. Only pairs of adjacent genes are of interest. Pairs with highly negative correlation are selected. Figure adapted from Vicente- García, 2014.

, r A B

A x B x

A x B x

i A

i n

i B

i n

i A i B

i n

2

1

2

1 1

$

$

=

- -

- -

= =

=

r r

r r

^ ^

]

] ]

h ]

g

g gh

/ /

g

/

(41)

12

Genomic insulator elements and its relevance in genomic 3D conformation

Introduction

Pairs with significantly negative correlated genes (correlation value < 0), which potentially contain insulator elements, were selected for further analysis.

Euclidean Distance is calculated converting aGEM expression to two different expression status: 1 when they are expressed in the same tissue, 0 when they are not expressed in the same tissue, and then calculating the sum of all tissues for each pair of adjacent genes:

Distance value distributions have been calculated for each gene, and only pairs with both genes with a distance larger than the mean plus twice the SEM of the distance distributions for both genes have been taken into account.

These algorithms have been validated and new insulators have been detected. However, it is still unclear if insulators detected by these algorithms have any trait in common or perform their activity by one mechanism rather than other. For that reason, a more in-depth analysis of pairs detected by these algorithms was required.

The study of insulator elements would enlarge our knowledge about gene regulation and genomic organization. Its relevance is important for the study of new mechanisms of known diseases, as the development of new therapies based on the modification of gene expression (Gilbert et al., 2014; Luizon and Ahituv, 2015; Rauch et al., 2019). Moreover, insulators have classically been of interest because of its role as flanking elements for transgenic or gene therapy constructs. Thus, insulators protect the exogenous sequence from chromosomal position effects, where heterochromatin spreads and silence the transgene.

Furthermore, insulators also prevent the influence of regulatory elements inside the construct that could alter neighbouring endogenous locus, known as vector-mediated genotoxicity. It can result in malignant cellular transformation via activation proto-oncogenes or silencing gene repressors (Giraldo et al., 2003b). Their use as a part of a transgene has been reduced with the emergence of gene editing nucleases.

D A B, Ai Bi i

n 1

= - 2

=

^ h

/

] g

(42)

13 CRISPR as an outstanding genome editing tool for functional non-coding genomics

Introduction

3. CRISPR as an outstanding genome editing tool for functional non-coding genomics

Genome editing technologies have always been of interest to perform genetic modified animals to study diseases and molecular processes. The first experiments with Embryonic Stem Cells (ESC), using homologous recombination (HR) with a DNA template, achieved a relatively low efficiency (Smithies et al., 1985). Efficiency could be increased if a Double Strand Break (DSB) took place in the homologous endogenous DNA sequence site. Yeast meganucleases were the first attempt to perform it, achieving efficiencies 1000-fold as compared to previous methods (Choulika et al, 1995). They were overcome by newer Zinc- finger nucleases (ZFN) (Bibikova et al., 2001) and Transcription activator-like effector nucleases (TALEN). These are engineered DNA endonucleases from bacterial restriction enzyme FokI. The nuclease domain is bound to zinc-finger domains with known DNA- binding capacity or TALE proteins coding for binding to DNA, respectively (Geurts et al., 2009; Miller et al., 2011). However, the process to develop a correct ZFN or TALEN nuclease can be long and cumbersome, and in vitro and in vivo validations are needed. For that reason, they have been rapidly replaced by the new CRISPR/Cas system (Seruggia and Montoliu, 2014).

CRISPR (for Clustered Short Interspaced Short Palindromic Repeats) and Cas (for CRISPR associated) proteins were first discovered and studied as an adaptive immune system of archaea and bacteria more than 20 years ago (Mojica et al., 1993; Mojica and Montoliu, 2016). When a virus infects bacteria, bacteria defend itself by destroying viral genome, and some portions of cleaved viral DNA are stored in the bacterial genome in a CRISPR-spacer array. This CRISPR-spacer array is constitutively expressed and processed to crRNA (CRISPR RNA), complementary to viral genome. Natural CRISPR locus is also composed by Cas proteins, and a trans-activating crRNA (tracrRNA) that binds nuclease (Cas9) to crRNA. When virus tries to infect the bacteria again, crRNA-tracrRNA-Cas9 complex recognizes it and binds to the DNA. Then, Cas9 makes a DSB in the viral genome, preventing a new infection. As CRISPR elements are stored in the genome leading to its degradation, these elements are inherited by the descendants, inheriting also their defensive ability (Seruggia and Montoliu, 2014; Mojica and Montoliu, 2016)

This highly simple and evolved system has been engineered into a gene-editing tool for the modification of any genome (Jinek et al., 2012). crRNA and tracrRNA have

(43)

14

CRISPR as an outstanding genome editing tool for functional non-coding genomics

Introduction

been fused into a synthetic guide RNA (sgRNA) that guides Cas9 nuclease to the target site of the genome. Then, Cas9 makes a DSB and the endogenous mechanism of the cell repairs the cut by one of the two main repair pathways: the error-prone Non-Homologous End Joining (NHEJ), which can lead to INDELs and often to gene disruptions; or the less efficient Homology-Directed Repair (HDR), which allows to insert the precise mutation at the correct site (Figure I.7). Since the first edited mammalian cells in culture (Cong et al., 2013; Mali et al., 2013b) or the first edited mice (Wang et al., 2013) its popularity have raised until becoming a technique used daily (Seruggia and Montoliu, 2014; Mojica and Montoliu, 2016).

ZFN and TALEN must be engineered to perform the best protein-DNA binding contact. Unlike them, CRISPR is a natural and highly optimized tool only based on RNA- DNA binding through Watson-and-Crick nucleotide pairing code, which is more stable than electrostatic interactions between DNA and proteins. Furthermore, it is only based on the binding of a sgRNA to a 20 nt DNA region, making each sgRNA easy to create, reproducible and robust.

A B

Figure I.7 CRISPR/Cas9 genome editing technology. (A) Cas9-sgRNA complex is directed to DNA target and makes a Double Strand Break (DSB) in both strands. Proto-spacer adjacent motif (PAM) is depicted in green. (B) Double Strand Break can be repaired by two pathways. Error-prone Non-Homologous End Joining (NHEJ) pathway introduces insertions and deletions (INDELs) that can cause a gene disruption.

Homologous-Directed Repair (HDR) pathway uses an extra template DNA to repair, allowing the gene edition.

Figure adapted from Fernández et al., 2017 and Josa et al., 2017.

(44)

15 Study of Tyr regulation by gene editing

Introduction

CRISPR systems can be used to successfully generate a large variety of genomic alterations, as small INDELS for knocking out genes, substitutions and insertions, deletions, insertions, or even inversions. Genome editing of regulatory elements can be tricky and harder as most of them are located next to repetitive elements. Previous gene editing technologies have sometimes failed as it was difficult to create an efficient and powerful guided nuclease or to provide a homology region in the target DNA. As CRISPR/Cas9 only needs 20 nt DNA to be targeted by Cas9, it is easy to design an approach to practically any genomic place, making this tool an essential tool to study non-coding DNA. In our laboratory, CRISPR/Cas9 system has allowed us to develop mouse models to study regulatory regions of albinism- related genes that were previously impossible to analyse using classical transgenic strategies (Seruggia et al., 2015).

4. Study of Tyr regulation by gene editing

4.1. Albinism

Albinism is a heterogeneous rare genetic condition with a prevalence of 1:17,000 people in North America and Europe (Montoliu et al., 2014; Montoliu and Marks, 2016).

It is characterized by visual deficits and a variable hypopigmentation of skin, hair and eye (oculocutaneous albinism) or only the eye (ocular albinism) (Schiaffino et al., 2002; King and Oetting, 2006). Visual alterations include foveal hypoplasia, reduced pigmentation of retinal pigment epithelium cells, photoreceptor rod cell deficiency, misrouting of optic nerves at chiasm connection and reduced pigmentation of the iris (King et al., 2001; King and Oetting, 2006), which can lead to reduced visual acuity, impaired stereoscopic vision, photophobia, iris transillumination or poor night vision. It can be caused by mutations in at least one of 20 genes related directly or indirectly with melanin synthesis pathway (Montoliu and Marks, 2016). Albinism can be present in isolation (non-syndromic) or in combination with additional abnormalities outside the skin and the eye (syndromic) (Table I.1).

Oculocutaneous Albinism Type 1 (OCA1), the most common type of albinism among Caucasian patients with a prevalence of 1:40,000 people, is caused by mutations in the tyrosinase gene (TYR) (Grønskov et al., 2007; Mártinez-García and Montoliu, 2013). TYR is a copper-dependant oxidase and the rate-limiting enzyme of the melanin synthesis pathway, which catalyses the oxidation of L-tyrosine and L-DOPA to form dopaquinone (Cooksey et al., 1997). It is a melanosome membrane glycoprotein type I of 529 amino acids (533 amino

(45)

16

Study of Tyr regulation by gene editing

Introduction

acids in mouse) and 60 kDa. Mutations in TYR leads to a lack or decrease in protein activity mainly due to the retention in the endoplasmic reticulum, and hence a lack of melanin and L-DOPA, the presumable key product of visual problems (Lavado et al., 2006).

Human TYR is localized in chromosome 11q14.3 and is composed by 5 exons and 4 introns spanning about 125 kb of genomic DNA (Ponnazhagan et al., 1994). Mouse Tyr is encoded in chromosome 7qD3, and has 86% cDNA homology and 90% protein homology with human TYR (Fryer et al., 2003). This gene is historical known as albino or c-locus (Beermann et al., 1990). Its expression is circumscribed to neural crest-derived melanocytes of skin, iris, choroid and many other places in the body, and optic cup-derived retinal pigment epithelium (RPE) cells (Beermann et al., 1992).

Mutations in TYR coding region are the responsible of clinical traits encountered in people with albinism (PWA). However, between 15 and 30% of people diagnosed with albinism do not show coding mutations in one or both TYR alleles (King et al., 2003), which suggests that some of them may have mutations in regulatory non-coding regions. The study of non-regulatory elements has been increasing since the development of ENCODE project (ENCODE Project Consortium, 2012; Yue et al., 2014) and its misfunctions are in the spotlight as putative responsible of some genetically undiagnosed albinism cases. As stated at the beginning of this chapter, ENCODE project has shown that only 2% of the genome

Table I.1 List of genes associated to albinism and number of mutations described for each one. Updated from Montoliu et al., 2014.

Mouse Human Albinism Mutations (HGMD)

Tyr TYR OCA1 395

Melanocytes

Oca2 OCA2 OCA2 235

Tyrp1 TYRP1 OCA3 37

Slc45a2 SLC45A2 OCA4 116

?? 4q24 OCA5 1

slc24A5 SLC24A5 OCA6 13

Lrmda LRMDA OCA7 8

Gpr143 GPR143 OA1 148

Slc38a8 SLC38A8 FHONDA 11

Lyst LYST CHS 88

Melanosomes/lysosomes

Hps1 HPS1 HPS1 45

Ap3b1 AP3B1 HPS2 31

Hps3 HPS3 HPS3 14

Hps4 HPS4 HPS4 18

Hps5 HPS5 HPS5 27

Hps6 HPS6 HPS6 26

Dtnbp1 DTNBP1 HPS7 3

Bloc1s3 BLOC1S3 HPS8 2

Bloc1s6 BLOC1S6 HPS9 1

Ap3d1 AP3D1 HPS10 2

HGMD 2017.4 Montoliu et al. PCMR 2014; Montoliu & Marks PCMR 2016

(46)

17 Study of Tyr regulation by gene editing

Introduction

correspond to coding regions, while the leftover 98% correspond to repetitive regions and regulatory elements. The misfunction of any of these regulatory elements can lead to similar diseases as those originating from coding mutations, being of major relevance their study for better comprehension, diagnosis and future personalized therapies. The study of mouse Tyr regulation is capital for bench to bedside research.

4.2. Tyr regulatory elements

4.2.1. Proximal regulatory elements of mouse Tyr

Tyr minigene constructs driven by various pieces of its own promoter and 5’upstream sequences (from 270 bp up to 5.5 kb) have been reported to rescue the albino phenotype of the recipient mice. Mouse Tyr presents a basal expression thanks only to the activity of the promoter and the proximal upstream region (Kluppel et al., 1991). A 270 bp upstream region is enough to achieve close to wild-type phenotype. However, its expression is weak, position-dependent and independent of copy number, but mimics the developmental profile of endogenous Tyr (Beermann et al., 1992).

Consensus transcription start site is located 80 bp upstream ATG, and TATA-like binding site is located 32 bp upstream TSS (Figure I.8). It lacks a canonical TATA motif but, instead, there is an initiator region (Inr) and a SP1 binding site that overlaps with an E-box, whose mutations reduces drastically Tyr expression. E-box element is recognised by two different basic helix-loop-helix leucine zipper (bHLH-ZIP) transcription factors: Upstream Transcription Factor 1 (USF) and Microphthalmia-Associated Transcription Factor (MITF).

These proteins depend on dimerization for DNA-binding and exerting their effects and affect melanocytes and RPE cells when mutated (Ferguson and Kidson, 1997).

A non-consensus SP1-binding site has been identified 43 nt upstream Inr region in human TYR promoter. Although not conserved in other mammals, a protected footprint by mobility shift assay has been detected in mouse promoter. Inr motif and the SP1 like sequence may be playing a role in basal transcription of mouse Tyr (Ganss et al., 1994b; Kaufmann and Smale, 1994). Another regulatory element is a M-box, located 200 bp upstream Tyr (Ferguson and Kidson, 1997). M-box is shared with the promoter of Tyrp1 and Tyrp2 genes (located in brown locus and slaty locus). All three genes have a 40% sequence identity and show cell-specific expression. It is a 11-bp sequence motif conserved among mouse and human TYR (Lowings et al., 1992). It is a positive regulatory element. Mutations in these

(47)

18

Study of Tyr regulation by gene editing

Introduction

regions resulted in 50-fold decrease. E-box in Inr is essential, but not M-box, as it is not able to mediate tissue-specific Tyr expression by itself and only enhances its activation.

There is also another upstream activator element at -245/-230 nt, which overlaps a putative melanocyte-specific enhancer element termed tyrosinase element-1 (TE-1) (Ponnazhagan et al., 1994). A negative regulatory element has been also found at -193/-125 nt that is relevant only in melanin-producing cells and respond to external factors as UV radiation or alfa- MSH.

Tyr regulation is strictly coupled with melanocyte regulation. Melanocyte stem cells are in synchrony with hair follicle stem cells. Melanoblasts originate from neural crest cells and migrate to epidermis and hair follicle bulge, where differentiate to Melanocyte Stem Cells (McSC). Wnt3a is important for the differentiation of neural crest cells to melanoblasts instead to neurons (Dorsky et al., 1998). McSC differentiate to melanocyte precursors, which express c-kit, DCT, MITF and TYRP1. Wnt3 activates MITF and KITL. which promotes MITF expression and thus activation of Tyr, Tyrp1 (tyrosinase related protein 1) and Tyrp2 (tyrosinase related protein 2). Wnt3a signalling is activated during growth phase of the hair (anagen), and then promotes melanogenesis through upregulation of MITF and its downstream genes Tyr and Tyrp1 (Guo et al., 2012). The switch between eumelanin (blackish pigment) and pheomelanin (reddish pigment) synthesis is regulated by alfa- melanocyte-stimulating hormone (alfa-MSH) which acts via cAMP. Elevation of intrinsic cAMP leads to enhanced eumelanin synthesis (Burchill et al., 1993; Fuller et al., 1993). In Retinal Pigment Epithelium, human TYR is activated by OTX2 directly and indirectly via

66 kb

Grm5

EcoRI

LCR EcoRI

LINE1 CNS-2

(-48 kb)

B A G

3x Mift (Inr, M-box, TDE)

ex1 ex2 ex3 ex4

ex5

12 kb

Xba I

Tyr Nox4

CTCF CTCF

Figure I.8 Tyr regulatory landscape. The five Tyr exons are depicted as black boxes and span a region of 66 kb. Three Mitf binding sites, coloured as yellow circles, are shown on Tyr promoter. The LCR, shown as a red box, is located 12 kb upstream Tyr promoter. The LCR contains B, A and G boxes. Two CTCF binding sites were describe upstream and downstream Tyr. A LINE1 element, depicted as a black box, was identified upstream LCR and was found methylated in L929 and B16 mouse cell culture. CNS2 element described by Murisier et al. 2007. 48 kb upstream the Tyr promoter is depicted as a blue box. Figure adapted from Seruggia, 2014.

Referencias

Documento similar