• No se han encontrado resultados

CAPÍTULO II: MARCO TEORICO

2.5 BASES EPISTEMICOS

2.5.1 Fundamento de la Regla de Exclusión

The neighborhood of cas genes (comprising of more than 20 genes) was

initially identified and characterized by Makarova et al. in 2002 by genomic context

analysis, but it was wrongly predicted to be a novel DNA repair system specific for thermophiles, as no connection with CRISPR was detected at the time. Almost

simultaneously, Jansen et al. identified by in silico analysis four genes located in the

vicinity of CRISPR loci that were designated CRISPR-associated (cas1-4; Jansen et al.

2002). The first protein found to bind to CRISPR loci was a genus-specific

uncharacterized protein in Sulfolobus species corresponding to sso454 (Peng et al.

2003), recognizing double and single repeat DNA sequences and producing an

opening on the opposite side. Haft et al. in 2005 identified a guild of 45 Cas protein

families by Hidden Markov models, a categorization refined by Makarova et al. in 2006

taking into account genomic context information, resulting in 25 Cas protein families

(Makarova et al. 2006). These families are proposed to be involved in the generation,

expansion, maintenance, transfer between genomes and function of the CRISPR elements.

With the rapid growth of experimental characterisation and identification of novel CRISPR systems in more prokaryotic genomes, it became apparent that existing CRISPR/Cas classification systems grew increasingly inadequate and did not reflect the emerging phylogenetic relationships between the system components. Moreover, with the elucidation of many Cas protein structures from different families and analysis of an increasing number of gene sequences, previously undetected homologous relationships emerged which enabled the unification of certain Cas families and the

identification of novel ones (Makarova et al. 2011b). As a result, recently Makarova and colleagues (2011a) proposed an updated, polythetic classification of CRISPR/Cas systems based on gene composition, operon organisation and the phylogenetic and functional relationships between Cas genes. According to the novel classification, CRISPR/Cas systems are organised into three phylogenetically distinct types (I-III),

and each major type can be further divided into individual subtypes (Makarova et al.

2011a and b). This classification is summarised in figure 1.7 and the subtypes distribution in table 1.1.

Figure 1.7: Outline of the main types and subtypes of the CRISPR/Cas systems and their phylogenetic relations

The most common composition and arrangement of cas genes is shown for each subtype, but gene order may vary in each organism. Gene families are color-coded and the family name can be seen under each gene. Signature genes for each main type are highlighted in green, and for each subtype in red. The star in gene cas10d indicates a putative inactivated polymerase - HD domain. The letters above certain genes stand for: RE: processing endonuclease for crRNA maturation; L: large subunits of effector complexes mediating interference; S: small subunits of effector complexes; R: subunits of effector complexes that belong to the RAMP superfamily (Repeat Associated Mysterious Proteins; described in chapter 3). Dashed genes in type III systems may not be part of the same operon. Adapted from Makarova et al. 2011a.

The three main CRISPR/Cas types share a common core of two genes, cas1

and cas2, which are highly conserved and are found in almost all CRISPR-containing

species. Cas1 a highly conserved, basic protein that belongs to COG1518 (all COG

groups mentioned in this text refer to the analysis performed by Makarova et al. 2002).

Comparative sequence analysis and certain conserved residue patterns indicate that it

might be a putative novel nuclease and/or integrase (Makarova et al. 2002). Metal-

dependent nuclease activity on ss/ds DNA (non-sequence specific) was confirmed by

Wiedenheft et al. (2009) along with the elucidation of the Cas1 structure from P.

aeruginosa which revealed a unique fold (figure 1.8). Additionally, Cas1 from S. solfataricus exhibited a high binding affinity for ss/ds DNA, ss/ds RNA and DNA-RNA

hybrids, as well as strand annealing activity (Han et al. 2009).

Figure 1.8: Crystal structure of Cas1

Cartoon representation of the P. a e r u g i n o s a C a s 1 homodimer (adapted from Wiedenheft et al. 2009). The N-terminal domain of chain A is colored in yellow, and the C-terminal α-helical domain which contains the active site in gray. Chain B is colored in light blue. Conserved residues making up the active site are in red. Three of the residues (E190, H254 and D268) coordinate a manganese ion (green sphere).

The cas2 gene encodes a small (80-120aa) protein member of COG1343. Distant

similarities were found between members of this COG and a class of sequence- dependent, single-strand RNA nucleases called PIN-domain nucleases (after their identification in the N-terminus of the pilin biogenesis PilT protein), leading to the

speculation that Cas2 might also possess ribonuclease activity (Makarova et al. 2006).

The structure of Cas2 from S. solfataricus was solved by Beloglazova et al. (2008)

revealing an RRM-like domain (RNA recognition motif; structural motif consisting of

four β-strands and two helices arranged in a α/β sandwich) (figure 1.9), while the

protein exhibited metal-dependent ssRNAse activity. The universal distribution of this gene pair along with experimental evidence discussed in subsequent paragraphs, has led to the assumption that Cas1 and Cas2 mediate the integration of novel spacer

sequences into the CRISPR loci (reviewed in Sorek et al. 2008; van der Oost et al.

Sontheimer, 2010; Deveau et al. 2010; Al-Attar et al. 2011). The role of these core proteins in the current scheme of the CRISPR mode of action will be discussed later.

Figure 1.9: Crystal structure of Cas2

Structure of Cas2 from S. solfataricus, solved by the SSPF (PDB code: 2IVY). The active conformation is a homodimer, with the interface formed by the tandem β- sheets in each monomer that make up the RRM motif. Conserved residues are located on the loops at the edge of the central cleft, at the bottom of the structure.

Type I systems are characterised by the presence of cas3 (COG1203), a gene

encoding for a protein with conserved superfamily II helicase motifs and an additional

HD-nuclease domain, encoded separately in certain subtypes (Makarova et al. 2002).

Type I systems also contain multiple representatives of the RAMP superfamily (Repeat associated mysterious proteins), which are suggested to form large heteromeric

complexes and take part in invader silencing (Brouns et al. 2008). The RAMP

superfamily encompasses a large variety of protein families with ferredoxin-like folds,

predicted to have RNA-binding activity (Makarova et al. 2002, 2006; Haft et al. 2005)

and will be discussed in more detail in chapter 3. Characteristic RAMP families associated with type I subtypes include Cas5, Cas6 and Cas7 (COG1857) protein

families (Makarova et al. 2011a). Cas6 has been shown to possess metal-

independent, sequence specific RNAse activity, and is the processing endonuclease that generates the mature interfering RNA units (referred to as crRNAs from now on) from the primary CRISPR transcript, in every type/subtype it is associated with. An additional protein found in four out of six type I subtypes and a type II subtype is Cas4

(COG1468), a member of the RecB exonuclease family (Jansen et al. 2002, Makarova

et al. 2002). A number of studies have concluded that the targets of type I systems are

DNA viruses and plasmids (among others Brouns et al. 2008; Marraffini et al. 2008,

Garneau et al. 2010).

Type II systems have been found only in bacteria and contain only the

signature gene cas9 (COG3513), the core cas1/cas2 genes and either cas4 or csn2, a

modular gene (Makarova et al. 2011a). Cas9 family members are predicted to be large

(about 1000 residues), multidomain proteins including an N-terminal RuvC-like domain

(RuvC is a Holliday junction resolvase that belongs to the RNase H fold; Aravind et al.

2000) and an HNH nuclease domain, common in restriction endonucleases (Makarova et al. 2002). Targeting of plasmid and phage DNA was demonstrated in vivo for this

system, and Cas9 is implicated in the interference stage although no biochemical

characterisation has been presented (Barrangou et al. 2007; Garneau et al. 2010).

Type III systems are characterised by the presence of cas10 (COG1353).

Among the identified domains of this large multidomain protein (~1000 residues) is a permuted HD-superfamily hydrolase near the N-terminus, a globular uncharacterised

α+β domain, a Zinc-ribbon (well-known nucleic acid interacting domain) and the core

palm domain of DNA/RNA polymerases and nucleotide cyclases near the C-terminus

(Makarova et al. 2002, 2006). The function of this protein is yet to be elucidated, but it

has been shown to form multimeric complexes with the additional RAMP Cas proteins

in type III-B operons which can effectively target RNA in vitro (Hale et al. 2009).

Targeting of DNA has also been demonstrated in vivo for type III-A systems (Marraffini

and Sontheimer, 2008). cas6 is also part of type III systems. The core cas1 and cas2

genes are occasionally missing from type III operons, but in these cases they are

found to co-exist with other CRISPR/Cas systems (type I or type II) encoding cas1 and

cas2 in the same genome. This supports the theory that cas1 and cas2 are involved in

a different stage of CRISPR functioning, and co-regulation is not necessary (Makarova et al. 2011a). Mechanistic details of each stage in every CRISPR/Cas type will be discussed in detail in subsequent sections.