1.6. Justificación e Importancia
2.2.3 La Responsabilidad Penal
In contrast to the first stage of CRISPR functioning, processing of the CRISPR RNA transcript has been clarified for the most part in all three major CRISPR systems. This procedure is mediated by a single Cas processing endonuclease in each system, and at least in one case host RNAses have been shown to participate. Two specific functions have to be carried out by these enzymes, the first being the recognition of the precursor transcript and cleavage within a single site in each repeat to generate the mature form of crRNAs (figure 1.11), and the second the retention of the processed mature crRNA for subsequent usage by the respective effector proteins or complexes that mediate interference.
Figure 1.11: Outline of the second stage of CRISPR functioning
5’ handle spacer 3’ handle mature crRNAs precursor CRISPR transcript processing by system endonucleases 5’ 5’ 5’ 5’
In CRISPR/Cas types I & III, a single superfamily of endonucleases, namely Cas6, are responsible for the processing of the primary transcript of the CRISPR locus into mature crRNA units that include a complete spacer flanked by parts of the repeat
sequence (Carte et al. 2008, 2010). Cas6 family members are associated with
subtypes I-A, I-B, I-D, III-A and III-B, while different families are found in subtypes I-E (names used in the literature: CasE/Cse3/Cas6e) and I-F (Csy4/Cas6f). These proteins are part of the RAMP superfamily (Repeat-Associated Mysterious Proteins) and have been shown to contain tandem or single ferredoxin-like folds, which contain the RRM
motifs used to bind the target pre-crRNA (Carte et al. 2010; Haurwitz et al. 2010;
Wang et al. 2011). Despite their shared fold and structural topology, the distinct
families associated with each subtype exhibit remarkably different mechanisms for target RNA recognition and cleavage, although the final product is similar. This functional versatility is related to the specific repeat family of each subtype, as
identified by Kunin et al. in 2007, as the propensity of each repeat sequence to form
stable secondary structures (typically a stem-loop structure, depending on the palindromic nature of the repeat sequence) influences its mode of recognition and binding by the respective Cas proteins. Representatives of the three Cas6 families have been characterised biochemically and structurally, and their mode of action will be briefly described here.
In types I-E and I-F systems, the processing endonucleases (Cse3 and Csy4
respectively) are also subunits of the large multiprotein effector complexes that
mediate interference. The first identified complex of this type was characterised in E.
coli (type I-E) and termed CRISPR-Associated Complex for Antiviral Defence
(acronym: CASCADE) (Brouns et al. 2008; Wiedenheft et al. 2011). The repeat
sequences associated with this system is predicted to form a stable hexanucleotide
stem with a tetranucleotide loop. The structure of Cse3 from T. thermophilus (Gesner
et al. 2011; Sashital et al. 2011) is composed of a double ferredoxin-like fold, with a
four strand antiparallel β-sheet forming the central positively charged cleft of the
protein, where the phosphate backbone of the 3’ strand of the stem loop is bound. Upon binding to RNA, the protein undergoes a conformational change whereby a
previously disordered accessory β-hairpin recognizes the major groove of the RNA
helix, and a previously disordered loop interacts with the base of the stem loop, positioning the scissile phosphate in the active site (figure 1.12 A). The protein interacts specifically with four residues located either side of the stem loop. Cleavage occurs at a G-A bond at the 3’ base of the stem-loop. Mature crRNAs in this system,
as sequenced from E. coli during the characterisation of CASCADE, comprise of a
complete spacer sequence flanked by 8 nt of repeat derived sequence at the 5’ end and the remaining 21 nt of repeat containing the stem-loop on the 3’ end. A degree of heterogeneity was observed for the 3’ end, highlighting the importance of the 5’
handle (or 5’ psi-tag in the literature) for potential protein recognition and potentially in
self-nonself discrimination (Brouns et al. 2008; Jore et al. 2011). In type I-F systems,
the C-terminal domain of Csy4 adopts an extended conformation although the basic
secondary structure connectivity again resembles a ferredoxin-like fold (Haurwitz et al.
2010). The N-terminal domain is a typical ferredoxin-like fold. The stem-loop structure of the repeat interacts extensively with an arginine-rich helix in the C-terminal domain, while the ssRNA-dsRNA junction is positioned in the positively charged cleft between the two domains (figure 1.12 B).
Figure 1.12: Structures of processing endonucleases Cse3 and Csy4
(A) Superimposition of two T. thermophilus Cse3 structures (in orange and blue) bound to synthetic CRISPR repeat RNA. The arrow indicates the conformational change occurring upon RNA binding. RNA is illustrated as a light orange tube, while the scissile phosphate as an orange sphere (adapted from Sashital et al. 2011). (B) Ribbon diagram and electrostatic surface representation of the structure of Csy4 from P. aeruginosa bound to the RNA CRISPR repeat substrate. The RNA backbone is represented with orange sticks. Blue shaded areas indicate the positively charged and red areas the negatively charged regions. Adapted from Haurwitz et al. (2010).
A
180o
Sequence-specific hydrogen bonds tether the substrate in the active site so that the cleavage takes place immediately downstream of the hairpin, 8 nucleotides upstream of the spacer sequence. Both proteins remain bound to the cleavage products via the base-specific and electrostatic interactions formed with the RNA, which enables the subsequent use of the mature crRNAs by CASCADE and the analogous Csy complex.
A representative of the Cas6 family protein associated with subtypes I-A, I-B, I-
D, III-A and III-B has been characterised in Pyrococcus furiosus (Carte et al. 2008,
2010; Wang et al. 2011). Although the architecture of this protein also consists of two
ferredoxin-like domains it is apparent that the molecular mechanism for recognition and cleavage of the pre-crRNA has evolved to accommodate the type of unstructured
repeat that is predicted to associate with these subtypes (Kunin et al. 2007). The
conserved positively charged central cleft between the two ferredoxin-like domains is responsible for interaction with ssRNA, where conserved residues form contacts with specific conserved nucleotides near the 5’ terminus of the CRISPR repeat, anchoring the RNA in position for the cleavage reaction taking place on the opposite surface of the protein (figure 1.13). Mutation analysis confirmed that the catalytic active site and binding site are physically distinct, with the connecting substrate interacting weakly or transiently with the signature Gly-rich loop. Metal-independent cleavage of the pre- crRNA transcript occurs 8 nt upstream of each spacer, producing the conserved 5’ handle (termed psi-tag) present in the mature crRNA form and the 22 nt repeat- derived sequence at the 3’ end.
Figure 1.13: Crustal structure of PfuCas6
(A) Ribbon diagram of the apo structure of PfuCas6. Helices and strands are numbered from N to C terminus. The G-rich loop is highlighted in red and catalytic residues in green. (B) Electrostatic surface potential of the RNA-bound PfuCas6 illustrating the path of the bound RNA from the binding site to the catalytic site (His46), via the G-loop region. The RNA is represented with red sticks, and the numbers correspond to nucleotides. Figures adapted from Carte et al. (2008) and Wang et al. (2011).
The product remains bound to Cas6 until transferred to the respective effector
complex (Cmr complex or an archaeal version of CASCADE, in the case of P. furiosus
which contains both type I and III systems). The 3’ end of the mature crRNA in P.
furiosus in vivo is processed further by an unknown nuclease, but this seems to vary in
different organisms (e.g. S. solfataricus). Cas6 family proteins have not been found to
associate tightly with any effector Cas protein or complex, which grants them the flexibility needed to associate with multiple subtypes that potentially differ at the interference stage.
The catalytic mechanism used by all three types of processing endonucleases
seems to rely on a histidine and a tyrosine residue in the active site, along with a variable lysine or serine, all of which are necessary for acid-base catalysis. Moreover, the glycine-rich loop characteristic of RAMP proteins is potentially implicated in correct substrate orientation. However, all three proteins use distinct sequence and structure-specific recognition mechanisms to select their respective substrates, illustrating the versatility of the characteristic duplicate ferredoxin-like fold in RAMPs and providing a mechanistic illustration of the coevolution of CRISPR repeat
sequences and Cas proteins (Shah et al. 2010). To summarize, biogenesis of mature
crRNAs in type I & III systems proceeds through single cleavage events within the repeat sequences 8 nt upstream of the beginning of the spacer. The generated sequence therefore consists of three elements: i) the strictly conserved repeat-derived 5’ handle, predicted to be responsible for recognition and binding by the CASCADE- like effector complexes and/or determine target recognition as a self-nonself discrimination mechanism (discussed later); ii) the spacer sequence, responsible for target recognition by basepairing; iii) a heterogeneous repeat-derived 3’ end, with a
size range from 0 to 22 nt (Brouns et al. 2008; Hale et al. 2009; Carte et al. 2008;
Haurwitz et al. 2010; Lintner et al. 2011). The processing events that lead to trimming
of the 3’ end are still unidentified, as is the functional significance (if any) of this heterogeneity.
A quite remarkable procedure for CRISPR RNA maturation takes place in type
II systems, as discovered in Streptococcus pyogenes by Delcheva et al. (2011). In this
system, a novel RNA species was found in high copy number and identified as the
transcript of the opposite strand of a region upstream from the start of the cas operon
and the CRISPR array. Interestingly, a 25 nt region of this transcript, termed tracrRNA
(trans-activating CRISPR RNA), was complementary to the repeat sequence of S.
pyogenes (with only one mismatch), which is predicted to be unstructured. It was demonstrated that an RNA duplex formed by the tracrRNA and a repeat sequence in the pre-crRNA is sufficient to guide the cleavage of both strands at specific positions within the duplex region by the host RNase III, producing 1x crRNA units that consist of a complete spacer sequence flanked by the partial repeats. Further processing
takes place on the 5’ end of the spacer sequence by a still unidentified nuclease,
resulting in the mature crRNA form for this system (figure 1.14, Delcheva et al. 2011).
The latter comprises solely of a 5’ 20 nt spacer-derived sequence and a 19-22 nt repeat-derived sequence on the 3’ end. This composition is strikingly different from the mature form of crRNAs found in types I and III in that it lacks the characteristic 5’ repeat-derived handle. This feature could indicate a distinct mechanism for crRNA recognition by the proteins mediating the interference and potentially for the interference itself. The only Cas protein implicated in this stage is Cas9 (Csn1) although its exact function is unknown. In the model proposed by the authors the duplex formation between the tracrRNA and the pre-crRNA is enabled by Cas9, prior to recognition and cleavage of both strands by the host RNase III in a process termed
trans RNA mediated activation of crRNA maturation (Delcheva et al. 2011). Cas9
contains a McrA/HNH-nuclease domain and a RuvC-like (RNase H-like) domain
(Makarova et al. 2006), making it a suitable candidate for the second cleavage event.
There is no indication that the role of Cas9 is restricted at this stage, as it is possible
that it also participates in the interference mechanism as deletion of cas9 in S.
thermophilus resulted in loss of phage resistance (Barrangou et al. 2007). To this date, this is the first example of a host factor implicated in CRISPR function, highlighting the exceptional economy and versatility of this system.
Figure 1.14: Model for CRISPR RNA processing in type II systems
In the first processing event, basepairing between the tracrRNA and the repeats (black) in the precursor CRISPR transcript (spacers are in green), lead to site-specific cleavage by RNAseIII in the repeat sequence, generating repeat-spacer units. The second, still unidentified processing event takes place within the spacer sequence, generating the mature crRNA units in type II systems. (adapted from Delcheva et al. 2011)