• No se han encontrado resultados

Capítulo 3. El estudio del caso

3.1 La embriaguez y los delitos de homicidio y heridas

3.1.4 El encarcelamiento

         

 

Chapter  7  

Future  work  and  directions  

 

                                                           

This  research  in  this  thesis  has  attempted  to  target  rare  variation  predisposing   to  CD.  The  current  dataset  leads  to  the  conclusion  that  rare  variation  does  not   lead  to  disease  risk  in  this  family-­‐based  cohort,  but  future  work  based  on  the   results   here   may   possibly   lead   to   a   different   outcome.   Below   is   a   list   of   proposals  for  a  future  PhD  student:  

• Test  EPAS1  in  a  larger  case-­‐control  sample  size:  since  the  P  value  in  the   4,608-­‐sample   dataset   was   0.007   it   may   worth   testing   this   gene   in   a   larger  sample  size  for  any  significant  rare  variant  disease  associations.  A   custom   Taqman   genotyping   assay   containing   all   EPAS1   coding   SNPs   is   the  simplest  experiment  that  will  quickly  answer  this  question.      

• Resequence   CUBN:   this   gene   was   too   large   to   incorporate   into   the   Fluidigm   targeted   resequencing   assay.   For   this,   a   single-­‐gene   resequencing   experiment   in   a   medium   case-­‐control   sample   to   begin   with,  (approximately  500  cases  and  controls)  will  test  whether  there  is   an   excess   of   rare   variation   in   coeliac   cases   compared   to   controls.   Fluidigm  resequencing  technology  can  still  be  used  here,  but  a  48-­‐plex   assay  would  suffice.    

If  one  was  to  continue  the  search  for  rare  variation  in  CD  using  familial  samples   to  enrich  for  disease  mutations  and  to  account  for  familial  clustering  of  disease,   another   design   would   be   to   exome   sequence   every   affected   individual   from   many  coeliac  pedigrees  and  compare  to  a  matching  population  control  dataset   e.g.  the  UK10K  exome  dataset.  This  would  provide  a  highly  annotated  dataset  of   every  coding  mutation  in  all  sequenced  individuals,  however  it  may  also  lead  to   some   data   being   discarded   due   to   the   sharing   of   chromosomal   regions   in   families.  Additionally,  up  to  thousands  of  samples  may  be  required  to  achieve   the   statistical   power   required   in   a   complex   disease,   but   in   terms   of   sample   design,  many  different  approaches  can  be  applied  to  achieve  the  best  statistical   result.  It  has  been  shown  that  exome  sequencing  trios  and  then  performing  a   family-­‐based  association  test  may  be  particularly  useful  for  rare  variants,  since   the   sample   set   would   be   robust   to   population   stratification   and   Mendelian   errors   can   be   checked   to   reduce   the   false   positive   rate   (De,   Yip   et   al.   2013).   Furthermore,  there  is  evidence  of  increased  sensitivity  to  find  lower  effect  sizes  

with  the  use  of  an  enriched  trio  (one  sibling  from  an  ASP)  in  gene-­‐based  tests   (Preston   and   Dudbridge   2013   in   press).   Since   the   study   here   utilized   a   family   design  and  a  case-­‐control  design  on  candidate  genes,  it  provides  a  clue  that  the   search   for   heritability   may   yield   positive   results   if   focused   elsewhere.   The   following  section  discusses  future  research  in  the  field  of  genetics  that  can  be   applied  to  CD,  if  one  was  to  move  away  from  attempting  to  locate  rare  disease   variation.    

 

7.1  Further  research  in  the  field  of  coeliac  disease  genetics  

 

Immunochip   findings   in   CD   show   that   most   of   the   association   signals   are   localized  around  transcription  start  sites  and  3’  UTR  regions  (Trynka,  Hunt  et  al.   2011).  Additionally,  ENCODE  findings  revealed  that  most  disease  variants  lie  in   regulatory  regions  and  significant  activity  in  these  areas,  including  how  much  of   the  protein  is  produced  rather  than  any  modification  to  its  structure,  prove  that   there   is   much   more   occurring   in   non-­‐coding   regions   than   previously   thought   (Schaub,  Boyle  et  al.  2012).  For  further  genetic  studies  in  CD,  it  may  be  a  good   idea  to  revisit  findings  from  GWAS  and  fine  mapping  studies  and  attempt  to  link   variant  signals,  even  those  not  reaching  GWAS  significance  as  these  probably  fit   under   the   umbrella   of   undetected   loci,   with   a   causal   variant.   Studies   have   shown   that   SNPs   associated   with   common   traits   are   enriched   for   expression   quantitative   trait   loci   (eQTL)   (Lango   Allen,   Estrada   et   al.   2010;   Nica,   Montgomery  et  al.  2010;  Nicolae,  Gamazon  et  al.  2010),  and  even  the  last  CD   GWAS   study   found   significant   eQTLs   in   20/38   non-­‐HLA   coeliac   loci   (Dubois,   Trynka  et  al.  2010).  The  best  example  is  the  SORT1  gene  associated  with  plasma   LDL  concentration,  where  the  associated  variant  modifies  a  CEBPB  transcription   factor   binding   site   located   in   an   enhancer,   directly   altering   the   expression   of  

SORT1  (Musunuru,  Rader  et  al.  2010).  Since  common  trait  associated  SNPs  may  

be   acting   by   altering   gene   regulatory   regions,   assessing   cell   subtypes   with   phenotypic   associations   might   be   able   to   identify   true   causal   variations.   The   ENCODE  project  revealed  SNPs  associated  with  a  disease  phenotype  were  also   associated  with  a  specific  cell  type  or  transcription  factor  (Dunham,  Kundaje  et  

al.   2012).   A   study   by   Trynka   et   al   supports   this   finding   in   a   study   identifying   chromatin   marks   in   cell   types   (Trynka,   Sandor   et   al.   2013).   They   show   that   chromatin  peaks  overlap  with  SNPs  associated  with  common  traits,  e.g.  31  SNPs   from  RA  regions  overlap  with  chromatin  marks  in  CD4+  regulatory  T  cells.  Their   findings   highlight   that   cell   type   specific   chromatin   marks   associated   with   phenotype   can   identify   causal   cell   types.   Looking   deeper   into   immune   cell   subtypes   in   CD   associated   loci   may   therefore   be   the   next   step   to   further   elucidate  specific  causal  pathways.    

Methods  for  single-­‐cell  analysis  can  be  applied  to  enable  deeper  resolution  of   cell   types.   Methods   published   in   the   past   have   employed   whole-­‐genome   amplification   (WGA)   of   single   cells   (Zhang,   Cui   et   al.   1992)   and   degenerate   oligonucleotide   PCR-­‐based   methods,   but   this   technique   generates   short   products  not  useful  for  many  applications  (Telenius,  Carter  et  al.  1992).  Multiple   displacement  amplification  using  hexamer  primers  and  Phi  29  DNA  polymerase   generates  much  larger  products  (<10Kb)  (Dean,  Nelson  et  al.  2001)  and  is  used   for  genotyping  SNPs  on  Illumina  chips,  for  example  (Barker,  Hansen  et  al.  2004).   New   methodologies   are   continuously   being   published   to   increase   coverage   required  for  single  cell  sequencing.  A  recent  study  reported  a  new  WGA  method   named   MALBAC,   eliminating   amplification   bias   associated   with   previous   WGA   methods   (Zong,   Lu   et   al.   2012).   The   authors   designed   primers   to   anneal   randomly  to  single-­‐cell  DNA  molecules,  performed  PCR  with  a  DNA  polymerase   with   displacement   activity   to   create   semi-­‐amplicons,   and   then   used   these   as   templates  to  produce  full  amplicons  (Figure  7.1).  With  this  technique,  they  were   able   to   identify   SNVs   from   MALBAC-­‐amplicons   with   no   false   positives   and   measure  mutation  rates  of  cancer  cell  lines.    

             

Figure  7.1:  MALBAC  single-­‐cell  WGA  to  decrease  amplification  bias    

 

   

MALBAC  =  multiple  annealing  and  looping-­‐based  amplification  cycles.  Taken  from  Zong,   Lu  et  al.  2012.    

 

Now,   advances   in   NGS   have   enabled   direct   analysis   of   single   cell   genomes.   A   recently   published   study   applied   single-­‐cell   RNA   sequencing   in   dendritic   cells   from   bone   marrows   of   mice   to   investigate   heterogeneity   in   the   response   of   these  cells  to  lipopolysaccharide  (Shalek,  Satija  et  al.  2013).  The  study  revealed   interesting   findings   surrounding   variation   across   single   cells,   such   as   bimodal   splicing  patterns  with  one  isoform  having  a  distinct  function,  differential  activity   in   clusters   of   genes   (i.e.   in   antiviral   regulatory   genes   where   co-­‐variation   in   different   cell   transcripts   helped   to   identify   the   antiviral   cell   circuit),   and   variation  in  expression  patterns  reflecting  different  cell  developmental  states.  If  

  194   such  variation  is  observed  across  immune  cells,  there  is  further  scope  in  linking   disease  genotypes  to  single-­‐cell  phenotypes.    

Commercial  companies,  such  as  Fluidigm,  have  also  progressed  onto  single  cell   genomics.   Fluidigm’s   intergrated   microfluidics   system   has   been   developed   for   preparation   of   hundreds   of   cDNA   libraries   from   single-­‐cell   samples   for   mRNA   sequencing,   enabling   single-­‐cell   gene   expression   profiling.   The   technology   combines  96  cDNA  library  preparations  in  parallel  on  an  array  (Figure  7.2).  The   amplified  cDNA  samples  are  then  subjected  to  library  preparation  for  Illumina   sequencing.  The  method  has  shown  to  produce  high  quality  sequencing  libraries   by   Fluidigm’s   Research   and   Development   group,   and   also   confirmed   transcriptional  heterogeneity  within  homogenous  cell  populations  (Shug,  Chen   et  al.  2013).  Using  this  technology  to  assess  single-­‐cell  expression  in  CD  might   detect   whether   there   are   specific   variations   within   cells   from   CD   associated   immune  loci.    

 

Figure  7.2:  Fluidigm  IFC  cell  capture  illustration    

 

   

The  IFC  array  performs  single–cell  cDNA  library  preparations  in  tiny  compartments.   Taken  from  (Shug,  Chen  et  al.  2013)  

 

To  summarize,  the  points  outlined  at  the  start  of  this  chapter  can  be  undertaken   for  further  progression  of  locating  rare  variation  in  CD:  EPAS1  and  CUBN  might   hold  key  genetic  variants  predisposing  to  CD  risk  and  are  likely  candidate  genes   based  on  their  function  and  findings  in  this  thesis.  If  these  experiments  do  not  

ERCC 0 10 0.9 0.97 0 10 0.96 0.97 0 10 0.97 0.96 0 10 0.96 0.97 0 10 0.97 0.9 010 0 10 ERCC 0.97 0.95 0.97 0.97 0.96 0.9 0.97 0.97 0.97 ERCC 0.9 0.97 0.96 0.9 0.97 0.97 0.96 0.9 010 0 10 ERCC 0.96 0.96 0.97 0.96 0.97 0.97 0.9 ERCC 0.97 0.96 0.97 0.96 0.97 0 10 0.97 0 10 ERCC 0.97 0.97 0.97 0.97 0.96 ERCC 0.9 0.97 0.97 0 10 0.96 0 10 ERCC 0.97 0.9 0.97 ERCC 0.9 0 10 0.9 0 10 ERCC 0.97 0 10 0 10 0 10 0 10 0 10 0 10 0 10 ERCC a b a b

Introduction

Fluidigm Corporation

Results

Single-cell mRNA-Seq has emerged as a powerful tool for understanding