Human mitochondrial DNA (mtDNA) is a circular, double-stranded (ds) 16.5-kilobase (kb) molecule that encodes two ribosomal RNAs (rRNAs) and 22 transfer RNAs (tRNAs), as well as 13 of the ~80 subunits participating in oxidative phosphorylation. The two mtDNA strands, termed heavy (H) and light (L), are transcribed into genome- length polycistronic transcripts from two respective promoters, HSP2 and LSP, which are located in the control region of the genome1,2. The H strand is further transcribed from an additional promoter, HSP1 (ref. 1), which generates a shorter transcript comprising two rRNAs and two tRNAs3. The mtDNA is organized in nucleoprotein entities, termed nucleoids, which contain from two to ten copies of the genome4,5, plus proteins that regulate DNA transactions6.
The architecture, packaging, copy number and maintenance of mtDNA depend upon the mitochondrial-specific, nuclear-encoded mitochondrial transcription factor A (TFAM or mtTFA)7–10. In mouse, Tfam is an essential gene11, and in some cell types, TFAM is present in amounts sufficient to coat the entire genome12,13. Footprinting studies in organello have revealed phased binding at regular intervals in the control region14. TFAM bends and packs mtDNA through nonspecific binding8,9,13, as shown by atomic force microscopy studies, which showed that recombinant human TFAM induces bending and com- paction of nonspecific DNA into nucleoprotein structures reminis- cent of mitochondrial nucleoids7. Other studies revealed that TFAM preferentially binds cruciform15, oxidizes mtDNA16 and is involved in base-excision DNA repair17.
To determine the binding sites of TFAM to DNA, footprinting studies were conducted with the protein in the presence or absence of the
mtRNA polymerase (MTRPOL), which forms a complex with tran- scription factor B2 (TFB2M)18. TFB2M is a bona fide transcription factor that forms a transient complex with MTRPOL, melts the DNA at the promoter and interacts with the priming substrate19. Notably, it has a paralog, the ‘transcription factor’ B1 (TFB1M), and both are ancestrally related to methyltransferases20, but they have different functions, as TFB1M is a ribosome large-subunit dimethylase21. Footprinting assays revealed that TFAM weakly protects HSP1 over the 23 base pairs (bp) between positions −35 and −13 upstream of the transcription initiation site2. Footprinting studies with LSP, recombinant TFAM, MTRPOL and TFB2M revealed protection by TFAM of 23 bp from −38 to −15 at LSP22. Additional analyses showed that the MTRPOL–TFB2M complex covers the region immediately downstream of the TFAM-binding site, viz. from bp −14 to +10, with TFB2M putatively positioned close to TFAM19. This is consistent with earlier suggestions that TFB2M contacts the C-terminal tail of TFAM, thus facilitating binding of MTRPOL–TFB2M to the LSP promoter and bridging the transcription machinery to the preinitiation com- plex18,23,24. LSP and HSP1 are located close together, each contain- ing two partially conserved segments of 10 and 12 bp separated by 6 bp. In vitro studies have shown that, in the absence of TFAM, the MTRPOL–TFB2M complex initiates transcription from HSP1 on the joint LSP–HSP1 template25. Addition of small amounts of TFAM shifts the balance of initiation toward LSP, whereas a large excess of TFAM restores initiation at HSP1 (ref. 25). When overexpressed, TFAM appears to trigger a paradoxical suppression of transcrip- tion, combined with effects on mtDNA replication, which have been
1Department of Structural Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona, Spain. 2Structural Biology Program, Institute for Research in Biomedicine, Barcelona, Spain. 3Departmento de Bioquímica y Biología Molecular y Celular, Universidad de Zaragoza–CIBER de Enfermedades Raras (CIBERER), Zaragoza, Spain. 4Institute of Biomedical Technology and Tampere University Hospital, University of Tampere, Tampere, Finland. 5Present address: Department of Bioanalytics, R&D Protein Analytics, Biologics Research, Pharma Research and Early Development (pRED) Penzberg; Roche Diagnostics GmbH, Penzberg, Germany.
Correspondence should be addressed to M.S. ([email protected]).
Received 10 June; accepted 13 September; published online 30 October 2011; doi:10.1038/nsmb.2160
Human mitochondrial transcription factor A induces a U-turn structure in the light strand promoter
Anna Rubio-Cosials
1, Jasmin F Sidow
1,5, Nereida Jiménez-Menéndez
1,2, Pablo Fernández-Millán
1, Julio Montoya
3, Howard T Jacobs
4, Miquel Coll
1,2, Pau Bernadó
2& Maria Solà
1Human mitochondrial transcription factor A, TFAM, is essential for mitochondrial DNA packaging and maintenance and also has a crucial role in transcription. Crystallographic analysis of TFAM in complex with an oligonucleotide containing the mitochondrial light strand promoter (LSP) revealed two high-mobility group (HMG) protein domains that, through different DNA recognition properties, intercalate residues at two inverted DNA motifs. This induced an overall DNA bend of ~180°, stabilized by the interdomain linker. This U-turn allows the TFAM C-terminal tail, which recruits the transcription machinery, to approach the initiation site, despite contacting a distant DNA sequence. We also ascertained that structured protein regions contacting DNA in the crystal were highly flexible in solution in the absence of DNA. Our data suggest that TFAM bends LSP to create an optimal DNA arrangement for transcriptional initiation while facilitating DNA compaction elsewhere in the genome.
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.
suggested to be due to increased mtDNA compaction from nonspe- cific binding2,18,25–27. TFAM may therefore have a key role in the interplay between different mtDNA transactions in vivo.
TFAM is a high-mobility group protein of type B (HMGB), which contains two tandem HMG-box domains, HMG1 and HMG2, sepa- rated by a linker and followed by a C-terminal tail that confers specific recognition for LSP and mediates interaction with transcription factor TFB2M10,23,24,28,29. In general, HMGB proteins may contain a single HMG domain, like the sequence-specific transcription factors LEF-1, TCF-1 and SRY; or the non–sequence-specific proteins Drosophila HMG-D and yeast NHP6A and NHP6B. Other HMG- box representatives contain two HMG domains, like the chromatin- binding proteins HMGB1, HMGB2 and yeast ABF2p; or even more domains, like human transcription factor UBF; all these domains are similar at the sequence and structural levels30. In particular, TFAM is a protein containing two HMG-box domains with sequence-specific DNA-binding capability that also participates in events that involve nonspecific DNA binding. Structural analyses of HMGB domains have revealed that they have an L shape, the short L-arm consisting of two short antiparallel α-helices (helix 1 and 2) and the long L-arm comprising an elongated segment of about six to seven residues from the N terminus of the domain, packed against a C-terminal α-helix (helix 3). The structures solved in complex with DNA revealed that contacts occur through the internal, concave surface of the L, into which DNA fits with pronounced bending, varying from 61°, as in rat HMG1 protein bound to modified DNA31, to 117° for mouse tran- scription factor LEF-1 (ref. 31). DNA bending is stabilized by several polar and nonpolar interactions and, importantly, by a characteristic intercalation of nonpolar residues from either or both helices 1 and 2, which disrupt base-pair stacking. Such an L-shape was earlier reported for the HMG2 domain of TFAM29. An HMG-box structure is also predicted for HMG1 (ref. 32), but the details and overall organization of the full-length protein, as well as the molecular basis of its interaction with DNA, are unknown.
The importance of understanding TFAM function has been high- lighted by recent reports showing that it influences various patho- logical states in the mouse33,34. In addition, it can apparently operate as either a tumor promoter35 or suppressor36. To shed light on its function, we analyzed the crystal structure of full-length TFAM in complex with a double-stranded oligonucleotide encompassing the LSP sequence, and we assessed the flexibility of the free protein in solution by small-angle X-ray scattering (SAXS).
RESULTS
Both HMG boxes of TFAM bend the LSP-22 sequence
Full-length mature TFAM did not crystallize without DNA, and the best crystals were obtained with an LSP oligonucleotide of 22 bp (LSP-22, Fig. 1a–d and Table 1), comprising the sequence fully protected in previous footprint assays2,22. Structure determination by experimental methods, which included X-ray data from crystals modi- fied by seleno-methionine (SeMet-TFAM–LSP-22) or bromouracil (TFAM–LSP-22Br), gave rise to electron density maps that allowed unambiguous protein and DNA sequence assignment (Supplementary Fig. 1). A detailed analysis of protein-DNA interactions showed that the structures from crystals grown in the presence of SeMet-TFAM or LSP-22Br did not deviate substantially from that of the native protein bound to LSP. TFAM is an all-α modular protein, comprising two HMG-box domains (HMG1 and 2), each of which spans ~75 residues (residues 44–120 and 153–225, respectively, Fig. 1c, Supplementary Fig. 1d,e). These are connected by a linker of helical conformation (residues 124–152) and are followed by a C-terminal tail in almost
extended conformation (residues 226–237). In both HMG domains the three helices fold into an L-shaped arrangement typical of HMG boxes (see above and Fig. 2a), where the L-‘inner’ surface contacts the DNA minor groove by polar and nonpolar interactions, with specific residues partially or fully intercalating between bases (see below).
The two HMG boxes have a head-to-head orientation, with the short L-arms, comprising helices 1 and 2, oriented toward the central two- fold axis of the double HMG-domain structure (Fig. 1c). In contrast, a ‘tail-to-tail’ orientation, with the two short L-arms oriented out- ward with respect to the central axis, was found for the synthetic didomain protein SRY.B, designed to create a stable complex of a nonspecific HMG box bound to DNA37. TFAM HMG1 contacts LSP region 1, which comprises the sequence T1A2A3C4A5G6 (numbered as in Fig. 1a), whereas HMG2 contacts LSP region 2, of sequence C14C15A16A17C18T19A20A21. In these regions the protein markedly flattens and widens the DNA minor groove by separating the phos- phate backbones of the two strands (Fig. 1c).
Overall, the DNA undergoes two sharp ~90° kinks, each caused by one HMG box (Fig. 1d), giving rise to an overall U-turn of ~180°.
Within the kinked regions the three base steps at sequence A3C4A5G6 in LSP region 1 and C14C15A16A17 in region 2 show high positive rolls (with maximums of 50° and 60°, Fig. 1d) due to a sharp bend toward the major groove, whereas the twist value of the central steps (DNA steps are indicated by a slash, /) C4A5 / T18G19, and C15A16 / T7G8 decreases (with minima of 10°) because of flattening of the minor groove (Fig. 1d). On the inside face of the bends, at the DNA major groove of both regions 1 and 2, the highly rolled base pairs undergo regular Watson-Crick interactions and bifurcated interstrand bonds38. In addition, water molecules are coordinated by the base atoms (Figs. 2 and 3a,b). On the outside face of the bends, at the minor groove contacting the HMG boxes, the oxygen atoms from the phos- phate backbone are stabilized by polar contacts and positive charges provided by residues from the three helices of either HMG box, most notably the highly conserved Trp88 and Trp189 from HMG1 and HMG2, respectively (Fig. 2b and 3a,b; Supplementary Fig. 2a,b).
The HMG boxes recognize DNA by different means
The two HMG domains share the same fold (r.m.s. deviation 0.95 Å), but they are not identical. Notably, helix 1 of HMG2 is one turn shorter than its HMG1 counterpart, a feature that is not modi- fied upon DNA binding (Fig. 2a,c) and that differentiates this domain from any other reported HMG-box domain. This shorter HMG2 helix 1 is found in TFAM of all metazoans analyzed (Supplementary Fig. 2). Despite this structural difference between HMG1 and HMG2, they both intercalate nonpolar residues to the DNA, though by different means.
In HMG1, Leu58 from helix 1 intercalates between bases A3 and C4 (Figs. 1 and 2b), thus hampering their stacking and strongly con- tributing to the high positive roll of the corresponding base-pair step (Fig. 1d). This distortion is stabilized by a highly conserved neighbor- ing residue, Tyr57 of helix 1 (Fig. 3a, Supplementary Fig. 2a), which partially intercalates between bases T20 and G19 of chain D (Figs. 1, 2 and 3a) and makes a hydrogen bond with the latter guanine. Helix 2 also contributes to DNA distortion by partially intercalating Thr77, Thr78 and Ile81 (Figs. 1a, 2b and 3a). HMG2 interacts differently with DNA region 2 (Figs. 1a, 2b and 3b). At the position equiva- lent to HMG1-box Leu58, a polar residue, Asn163, does not inter- calate but makes hydrogen bonds with both chain D T6 and T7 of DNA region 2, inducing a shear to base pair A17–T6. The side chain of the preceding residue, Tyr162, highly conserved and part of the HMG2 hydrophobic core, showed alternate orientations in the
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.
unbound form, which was suggested to reflect a lower stability of the hydrophobic core29,32. In the present structure, Tyr162 has a single conformation that makes a hydrogen bond with chain C A16, in the same fashion as the topologically equivalent residue in HMG1, Tyr57, does with chain D G19. The most marked distortion within DNA region 2 is conferred by a hydrophobic residue from HMG2 helix 2, Leu182, which intercalates between C15–G8 and A16–T7 (Figs. 1 and 3b). In contrast, its topological counterpart in HMG1, Ile81, only partially intercalates (see above). Leu182 also has alternate conformations in the DNA-free HMG2 structure29, whereas in our structure one of the two conformers is selected for base intercalation during DNA binding (Fig. 2c). Additionally, partially intercalating residues and hydrogen bond interactions that stabilize the distortion
at DNA region 2 are shown in Figure 2b. Note that the highly con- served Tyr200, found in different orientations in the unbound HMG2 structure29,32, shows a single orientation in which the aromatic ring contributes to the HMG2 hydrophobic core and the hydroxyl group points toward a surface not facing the DNA.
Previous sequence and structural analyses of HMG-box domains identified that the type (polar or nonpolar) of (partially) intercalat- ing residue at positions equivalent to HMG1 Leu58 (or Asn163 in HMG2) in helix 1 (site ‘X’37, Supplementary Fig. 1e), and Thr77–
Thr78 (or HMG2 Pro178–Gln179) in helix 2 (site ‘Y’, Supplementary Fig. 1e), dictated the specific versus nonspecific DNA-binding mode, and were defined as ‘specificity determinants’ (refs. 30,39 and refer- ences therein). This position in helix 1 was defined as the primary
b
Region 1 Region 2Leu 3D
Region 1 Region 2 3′
5′ 5′ 3′ 5′ 3′ 5′ 3′
a
HMG1 (T1–G6) HMG2 (C14–C21)
Rolled bp
Twist
L Linker (A16–G11) Twist L
Rolled bp LSP coding
5′,C 3′
3′,D 5′
424 434 444
c
HMG2 HMG1
Helix 1
Leu58
HMG1 HMG2
Linker Helix 1
Helix 1
Helix 2
Helix 3
Helix 3
C N
L182
L58
Helix 2
Chain D Chain C T1 A2 A3 C4 A5
G6 T7
C8 A9
C10 C11
C12C13 C14
C15 A16 A17
C18 T19
A20 A21 C22 A22
T21 T20 G19
T18
C17A16G15 G13
G12 G11
G10 G9
G8
T7 T6
G5 A4
T3 T2
G1 T14
Helix 2 Helix 3 Helix 1 Helix 2
Leu182
Helix 3 C-ter tail
Lys237 246 227
225 194
191 178 172 161 152 123
120 93
90 77 71 56 43
Linker
Leu182
Leu182 Leu58
Leu58 86°
90°
88°
α α
A5–T18 G6–C17 C4–G19
C8–G15
C10–G13
C11–G12 C13–G10 C14–G9 A9–T14
T7–A16 A2–T21
A3–T20
C4–G19 G6–C17 C8–G15
A21–T2 C12–G11
A20–T3 T19–A4 A5–T18
C10–G13C12–G11 C13–G10
C14–G9 C15–G8 C11–G12
C18–G5 T19–A4
A21–T2
A20–T3 A17–T6
A16–T7 T7–A16
A9–T14 A3–T20
A2–T21 5′T1–A22 3′
3′C22–G1 5′
5′ T1–A22 3′
3′C22–G15′
Leu58 LSP-22 steps Leu182 –10
0 10 20 30
Roll angle TA/TA AA/TT AC/GT CA/TG AG/CT GT/AC TC/GA CA/TG AC/GT CC/GG CC/GG CC/GG CC/GG CC/GG CA/TG AA/TT AC/GT Cu/AG uA/TA AA/TT AC/GT
40 50 60 70
0 10 15 5 20 25 30 35
Twist angle TA/TA AA/TT AC/GT CA/TG AG/CT GT/AC TC/GA CA/TG AC/GT CC/GG CC/GG CC/GG CC/GG CC/GG CA/TG AA/TT AC/GT Cu/AG uA/TA AA/TT AC/GT
40 45
Figure 1 TFAM–LSP-22 complex.
d
(a) Crystallized light strand promoter (LSP-22) coding sequence (dark blue, chain C in the PDB), with mtDNA numbering, and complementary sequence (cyan, chain D).
Orange, green and yellow boxes symbolize HMG1, HMG2 and linker (L) domains contacting DNA (contacted base pairs in brackets). TFAM residues intercalate (pink background) or contact (domain color-code) the high rolled base pairs, which like the inverted motif are framed (black and pink outlined boxes, respectively), and central low-twisted steps are arrowed.
Dashed lines represent interstrand bifurcated hydrogen bonds.
(b) Alignment of rolled regions 1 and 2. The alignments are based on structurally equivalent residues (left) or on intercalating leucines (right).
(c) Top, representation of TFAM
domains along the sequence, with intercalating and last-traced residues indicated. Bottom, ribbon plot of the TFAM–LSP-22 crystal structure. Domains HMG1 (in orange) and HMG2 (green); respective helices 1 to 3, the central linker (yellow), flanking segments (gray), and N- and C-terminal ends are indicated. The intercalating and kink-stabilizing residues are shown in sticks. In B-DNA, the distance across the minor groove between phosphates of complementary strands spaced at four base pairs is 11.7 Å. In DNA region 1, this distance is 22.4 Å between T7 (chain C) and T20 (chain D), whereas in region 2 it is 21.9 Å, between A17 and G10. (d) Two views of LSP-22 representation, showing its U-turn shape. Intercalation sites are arrowed;
deviation from a straight dsDNA axis is depicted on top. The right panel shows the roll and twist angles, and intercalated steps along LSP-22.
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.
‘intercalation wedge’40. In nonspecific HMG domains, nonpolar resi- dues, chiefly Met or Phe, are found at the primary intercalation wedge, which is preceded by an aromatic residue (typically Phe or Tyr)30. At site Y, nonspecific HMGs likewise have a nonpolar residue30. In contrast, the DNA-specific boxes may harbor a polar residue in either of the two positions at the primary wedge30, although all structurally characterized complexes with cognate DNA show Phe–Met or Phe–Ile doublets—for example, Sox2 (PDB 1O4X)39, LEF1 (PDB 2LEF)41 and SRY (PDB 1J46)42—whereas a polar residue (usually Asp) is found at site Y, engaged in specific DNA recognition30. In TFAM HMG1, Phe57–Leu58 constitutes the hydrophobic wedge at site X and Thr77 is found at site Y (Supplementary Fig. 1e). Together, these residues are consistent with a specific DNA-recognition nature for domain HMG1. In contrast, in HMG2 neither of these sites intercalates the dsDNA. Instead, intercalation is done by Leu182, four positions ahead of site Y. Structural analysis of HMGB domains suggested classify- ing this position as a ‘specificity determinant’ (ref. 39 and references therein), but phylogenetic covariation analysis of several HMGB sequences30 did not show a substantial correlation with DNA mode recognition. The TFAM HMG2 domain is the first case where an intercalating wedge is found at this position.
TFAM intercalates residues at a DNA inverted motif
As explained above, HMG1 and HMG2 intercalate residues from different helices into the DNA (Fig. 2a, Supplementary Fig. 1e):
HMG1 intercalates Leu58 from helix 1 between the first two base pairs in the A3↓C4A5G6 sequence, whereas HMG2 does so with Leu182 from helix 2 between the second and the third base pairs of the C14C15↓A16A17 sequence. Therefore, either residue disrupts the stacking at different DNA steps of the respective highly rolled regions (Fig. 1a,b). Accordingly, superimposition of the two HMG
boxes (schematically represented in Fig. 1b, left panel) shows an overall spatial coincidence of the amino acids contacting the DNA (for example, Tyr57 structurally aligns with Tyr162). However, it shows a shift in the intercalating residues Leu58 and Leu182, and intercalated DNA steps. If the schematic alignment is done based on the intercalating residues (Fig. 1b, right panel), then DNA sequence A2A3↓C4 aligns with C15↓A16A17. If the latter is inverted to A17A16↓C15, it matches the former as both show the AAC pattern and are intercalated at the second step. This defines an inverted motif, AA↓C–10 bp–C↓AA, that follows the symmetry of the HMG boxes (Fig. 1c). The alignment of LSP with HSP1 and other 28-bp TFAM-binding sites, termed X and Y2, shows that the number of base pairs between the trinucleotide inverse repeats is systematically 10 bp. This suggests that the mode of TFAM binding to these sequences could be topologically similar. However, whereas the first AAC is identical among the four binding sites, the second trinucleotide is less conserved. Moreover, the nucleotide content of the intervening 10 bp varies considerably. We suggest that, whereas the HMG boxes intercalate similarly at the inverse repeats, these different sequence contexts could account for the differences in protein binding and DNA bending between LSP and other sites.
The linker adds contacts to the DNA
In the structure, the linker connecting HMG1 and HMG2 compen- sates for the repulsion of the backbone phosphates brought closer by the DNA U-turn (Fig. 1c). In particular, a set of positive residues at the C-terminal end of the linker makes three types of contact with the DNA. Two of these contacts are made by the highly conserved resi- dues Arg140 and Lys146 (Fig. 3c and Supplementary Fig. 2). Arg140 faces the major groove of DNA region 1, and Lys146 fronts the same groove at region 2. These residues interpose their positive side chain ends between the oxygens of two phosphates from complementary strands, thereby stabilizing the DNA kink at the major-groove side (Figs. 1c and 3c).
The third contact of the linker is made at a segment of the minor groove between DNA regions 1 and 2, comprising A16 to G13 (chain D) and A9 to C11 (chain C), which adopts canonical B-DNA conformation (Fig. 1c). The tetramethylene side chains of Lys139 and Lys147, together with the nearby electronegative phosphate backbones, lock the side chain of Met143, which points straight to bases G13 and T14 from DNA chain D (Fig. 3c). The two lysines and neighboring residues make additional contacts, mostly electrostatic and involving conserved residues (Fig. 3c, Supplementary Fig. 2), which stabilize the interaction without any specific recognition of the base atoms. In summary, the linker fits into the minor groove and stabilizes the two kinks. By passing perpendicularly over the DNA, it connects the two HMG domains at either side of the double helix, thus contributing to the overall U-turn shape of the dsDNA. In contrast, in artificial didomain protein SRY.B38, the basic linker is docked loosely along the minor groove between the HMG boxes, stabilizing the overall inter- action but without contributing to DNA bending, which, in this case, is only 101 Å.
Previous biophysical studies by fluorescence anisotropy, a sensitive technique for detection of macromolecular interactions in solution, demonstrated the contribution of the linker in DNA binding32. In these studies, the weak affinity of HMG2 for DNA was markedly stimulated by addition of residues from the linker to the construct, which included Lys146 and Lys147 (ref. 32). Activity studies with the yeast TFAM homolog Abf2p showed that replacement of its HMG2 box by a sequence includ- ing TFAM linker, HMG2 and the C-terminal tail, had a stronger effect on specific DNA binding and transcription activation than a construct Table 1 Crystallographic data processing and refinement statistics
TFAM-Br in complex with LSP-22 Data collection
Space group P21212
Cell dimensions
a, b, c (Å) 113.9,117.2, 56.53
α, β, γ (°) 90
Resolution (Å) 40.84–2.45 (2.58–2.45)
Rsym (%)* 8.0 (46.7)
I/σI 17.5 (4.2)
Completeness (%) 99.8 (100)
Redundancy 7 (7.4)
Refinement
Resolution (Å) 40.84–2.45
No. reflections 28,483
Rwork / Rfree 22.8 / 18.2
No. atoms
Protein 3,238
DNA 1,797
Water 185
B-factorsa
Protein 42.5
Ligand/ion 40.8
Water 40.2
R.m.s. deviations
Bond lengths (Å) 0.012
Bond angles (°) 1.4
aB factors as after Refmac5/Phenix refinement including TLS.
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.
lacking the linker28, pointing to the importance of the helical linker in contacting the DNA and influencing the affinity of HMG2 for DNA.
The C-terminal tail
The C-terminal tail is required for specific recognition of the DNA28,29 and for interaction with transcription factor B from the transcription initiation complex23. This tail is packed antiparallel to helix 3 up to Arg232, where the guanidinium group of the latter bridges to the side chain of Glu219 from helix 3 and to the DNA 3′-phosphate group of A21 (chain C) (Fig. 2d). Notably, and crucial for the interpretation of the structure (see below), the tail contacts the DNA on the other side of LSP from the transcription initiation site. However, by virtue of the U-turn imposed on the DNA, the C-terminal tail is nevertheless brought into close proximity with the sequence immediately upstream of the initiation site. The impor- tance of Arg232 was highlighted by previous transcription activa- tion and footprinting analyses, which showed that excision of the C-terminal residues Arg232–Cys246 impaired LSP binding and transcription activation. The same deleterious effect was caused by the single-point mutation R232C28. From position Arg232 to the last residue traced, Lys237, the C-terminal tail passes over the phosphates without making any specific interaction with bases, possibly because the DNA that crystallized does not include the sequence required.
In mammals the C-terminal tail is conserved up to Lys236 (this, fully conserved; Supplementary Fig. 2). The last residue traced in the structure, Lys237, is exposed to the solvent (Fig. 2d), and the residues Gln238–Cys246 were not visible because of crystallographic disorder, which could be due to intrinsic flexibility, a feature com- patible with the availability of this last segment for interaction with other proteins. These last residues show low conservation, possibly
reflecting species-specific adaptation to the targeted molecule(s).
Whereas the length of the C-terminal tail is conserved among mammals, markedly longer tails (by ~20 amino acids) are found in Xenopus laevis, Salmo salar and Anopheles darlingi (Supplementary Fig. 2). In contrast, the putative TFAM ortholog in Caenorhabditis elegans has only four residues after the last helix of HMG2, sug- gesting that this protein might function in DNA packing but not in transcription initiation43.
Particular regions of TFAM are intrinsically flexible
Previous studies based on UV CD spectropolarimetry showed an increase in the α-helix content of TFAM, upon DNA binding32. The crystal structure shows that TFAM and LSP intimately intertwine, indicating that both molecules structurally rearrange upon bind- ing, by mutual induced fitting. To characterize TFAM structurally in its free state we conducted SAXS analysis of the protein in solu- tion. The analysis of the scattering curve unambiguously showed the presence of a particle with an apparent molecular mass of 24 kDa, in good agreement with a TFAM monomer (25.6 kDa) and ruling out the presence of TFAM multimers. This particle presents a radius of gyration (Rg) with a value of 32.0 ± 0.3 Å, larger than expected for a 24-kDa protein (about 18 Å, according to the Flory equation Rg = 3 × N × 100.33). The corresponding pairwise distribution function, p(r), of the curve, which reflects the distribution of intraparticle distances, shows a smooth decrease toward a large maximum (Dmax) of 135 ± 5 Å (Supplementary Fig. 3a). These features indicate that unbound TFAM is highly flexible and that conformations of different dimen- sions coexist in solution. Another very informative analysis of the data is the Kratky representation of the experimental SAXS curve (Supplementary Fig. 3b), which yielded a profile corresponding to
HMG2 bound HMG2
unbound
90°
90°
c
Glu148
Arg227 Met222
Leu231 Lys228
lle223
Trp218 Glu219
Asp229 Leu230
Arg232
Thr234
Lys237 C-terminal
tail lle235
Helix 3 Val225
d a
TFAM-HMG2 C
Helix 3
Helix 1
Helix 1 Helix 2
Long arm
TFAM-HMG1
Leu58
Leu182 90°
Leu182 Leu58
Helix 2 Short arm
90°
Short arm
N
b
Lys136
Lys69
Lys146 Tyr150
Trp189 Arg157
Tyr211 Arg232 Arg233 Thr234 (Gln179)
Pro178 Leu182 Tyr162
Val166 Gln179
5′,C
P P P P P P P P P P P P P
P P P P P P P P
P P P P P P P P P P P P P P P P P P P P P P
5′,D 3′,C
3′,D T1
A2
A5 G6 T7 C8 A9 C10
G12 G13
Met143
C11 C12 G11 C13
G9 G10
T7 A16 A17 C18 G5 T19 A4 A20 T3 A21 T2 G1 C22
T6 G8 C15 C14
T14 G15 A16 T18 C17 A3
T21
C4 T20 G19 A22
Trp88
Leu58 Tyr57 lle81
Thr78 Thr77
Arg89 Ser61 Arg82
Arg140 Lys139
Lys156 Arg159 Asn163 Lys145
Figure 2 Comparison of HMG1 and HMG2 boxes and their contacts with DNA. (a) Superposition of HMG1 (orange) onto HMG2 (green). The HMG1-contacting DNA is shown as gray sticks. The L-shaped long and short arms, helices 1 to 3, and N- and C-terminal ends are indicated. Note the shorter length of helix 1 in HMG2. Leu58 and Leu182, located in different helices, are framed. (b) Scheme of the protein-DNA contacts.
Residues are framed according to the domain color-code (see Fig. 1); contacts with phosphates or bases are shown by red or black arrows, and (partially) intercalating residues are above the contacted bases. The inverted AAC motif and bases forming interstrand bonds are framed in pink and blue outlined boxes, respectively. (c) Structural superimposition of HMG2 in complex with DNA (green) onto the unbound HMG2 (violet; PDB 3FGH29).
The HMG2 overall shape does not substantially change upon DNA binding. The inset shows the two conformations of Leu182 in the violet unbound structure and the one selected upon DNA binding as found in the TFAM–LSP-22 complex (in green). (d) Close-up view of the C-terminal tail (in gray) packing against HMG2 helix 3 (in green) and the linker (yellow). Side chains participating in the hydrophobic core or in DNA interactions mentioned in the text are shown as sticks; oxygen and nitrogen atoms are in red and blue, respectively; salt bridges and hydrogen bonds are shown as dotted lines.
The position of Lys237 is indicated.
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.
a globular protein (implying folded domains) mixed with a flatter profile typical for unfolded proteins, substantiating a partial structural disorder for TFAM44.
To describe the SAXS curve as an ensemble of coexisting TFAM conformations we subsequently applied the ensemble optimization method (EOM, see Online Methods). We generated a pool of 10,000
structures based on the crystallographic protein coordinates, in which complete conformational freedom was allowed for in the linker and in the C-terminal tail. These calculations identified a subensemble of 50 conformations that, collectively, were in perfect agreement with the SAXS curve (χ2 = 0.59) (Fig. 4a, left panel). The corresponding Rg distribution was broad and similar to the one obtained from the initial
a
CA
HMG1-DNA Region 1
b
HMG2-DNA Region 2c
Linker180°
90°
B
T7
G15
A16 A17
C18
T6 T7
C15 G8C14 G9 G10
C13 C12 C17
A16
A5
C4 A3 G19
T20 T21 T18
T14 C11
A4 G5
T1 C12
C13
G15 A16 C17T18
A2 Thr77
Thr78
Leu58
Gln179 Leu182
Tyr162
Val166
Asn163
Lys147 Thr150
Met143
Lys136 Arg140
His137 Lys139
Lys136
Arg140 Lys136 Lys150
Lys147 Met143 His137 Lys139 Lys146
Glu219 Trp218 Trp218
Glu219 Lys146 Lys150
Lys147 Met143
Arg140
His137 Lys139
Lys146
Pro178 Trp189
Tyr162 Tyr162
Trp189
Pro178 Pro178
Leu182 Gln179 Leu182 Gln179
Asn163 Asn163
Tyr211 Tyr211
Pro155 Pro155
Tyr218 Tyr218
Lys96
Trp88 Thr77
Tyr57
Leu58 Ser61 Tyr57
Ser61
lle81 Thr78
Thr77 Thr78 Lys96
Trp88 Tyr57
Leu58 Ser61
Figure 3 Close-up views of three TFAM areas contacting LSP-22 (see top scheme for reference). (a) Contacts between HMG1 and DNA region 1;
(b) between HMG2 and DNA region 2; and (c) between the helical linker and the LSP minor groove. In all left panels, side chains intercalating (Leu58 in a, Leu182 in b), half-intercalating, hydrogen-bonding or salt-bridging to DNA are shown as sticks and colored as in Figure 1 domain color- codes. Water molecules and oxygen atoms are represented in red; nitrogen, sulfur and phosphate atoms are in dark blue, green and gray, respectively;
polar interactions are shown as black dashed lines, except for DNA interstrand hydrogen bonds, which are shown in cyan and involve adenines A3 or A5 (chain C) with G19 (chain D), or A16 or C14 (chain C) with G8 (chain D); LSP-22 sequence is shown as in Figure 1a. The middle panels show electrostatic potential surfaces (blue, positive; red, negative) mapped on the TFAM Connolly surface. The right panels depict residue conservation across metazoan TFAM molecules (see Supplementary Fig. 2a). Identical residues are shown in red; higher to lower similarity values gradually vary from dark orange (90 to 99%) to light yellow (30 to 50%); lower values, in white.
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.
pool of random conformations (Fig. 4a, right panel), indicating both the impossibility of describing TFAM in a single conformation and the high plasticity of both the linker segment and the C-terminal tail (Fig. 4b). Accordingly, the interdomain dis-
tance distribution of the selected subensemble was also similar to that obtained for the initial pool (Supplementary Fig. 3c). The possibility
that the TFAM linker formed a stable α-helix was explored by generat- ing a new pool of 2,000 conformations in which HMG1, HMG2 and the helical linker were assumed to be rigid bodies linked by flexible hinges. The computed SAXS profile of this model did not agree with the experimental data (χ2 = 2.93). In summary, unbound TFAM has two nonstructured regions, the linker and the C-terminal tail.
In conclusion, the SAXS analysis unambiguously shows that TFAM is a monomer in the experimental conditions tested. In addition, it shows that the protein is intrinsically highly flexible and that flex- ibility is not evenly distributed along the sequence but affects the linker and the C-terminal tail. Comparison with the crystal structure shows that the linker folds into an α-helix upon DNA fitting and binding, and supports a model where intertwining between the two macromolecules takes place, by TFAM adopting a fixed structure.
DISCUSSION
Working model for DNA recognition, binding and bending The crystal structure of TFAM bound to LSP-22 shows an intertwined molecular arrangement that cannot result from a direct contact between rigid molecules. Molecular intertwining can be explained by a partial interaction that stimulates a conformational change, leading progressively to a full contact. In this case the conforma- tional change is induced by each molecule to the other. A similar case was previously found for bacterial integration host factor and its target DNA, which also give rise to a DNA U-turn45. Based on our SAXS data, unbound TFAM has two folded HMG boxes linked by an unfolded segment (Fig. 5a). This, in addition to the much higher affinity of HMG1 for DNA than that of HMG2 (refs. 29 and 32), makes simultaneous binding of the two boxes to two separate DNA regions highly unlikely and suggests that HMG1 binds first (Fig. 5b).
This would induce a DNA bend, of about 70–110°, as predicted from previous structures of HMGBs bound to DNA (for example, PDB 1CKT31, 1J5N30 and 1J46, see ref. 42). This structure would then be stabilized by the linker, which contacts the minor groove by adopt- ing an α-helical conformation (Fig. 5c). This arrangement would place HMG2 on the opposite side of the double helix and close to it, Figure 4 SAXS analysis of unbound TFAM. (a) Left panel, experimental
scattering-intensity curve (black line) represented in a logarithmic scale as a function of the momentum transfer, s = 4π sin(θ) λ−1 (2θ, scattering angle; λ = 1.5 Å, X-ray wavelength). The fitted EOM (ensemble optimization method, see Online Methods) curve (red curve) describes the complete s-range. Right panel, radius of gyration (Rg) distributions of both the subensemble of conformations selected by EOM (red curve) and that of the starting 10,000 conformations of the pool (in black).
(b) Molecular representation of a subensemble of 50 models that describes the data, superimposed by their HMG2 domains (green surface); both side and bottom views (left and right panels, respectively) show that the HMG1 domain (orange ribbon) can be found in a wide range of orientations.
HMG2
TFAM
HMG1 TFAM
HMG1
HMG1
HMG1 HMG2
HMG2
MTRPOL TFB2M
HMG2
TFAM
TFAM –43
–35
LSP –15 –14
–7 Start site
Start site +6 +14
a b
c d
Figure 5 Working model for the role of TFAM in transcriptional activation at LSP. (a) TFAM presents two HMG box domains (labeled) that move freely with respect to one another (arrows).
Below the DNA double helix, the binding sites for TFAM (in black), transcription factor B2 (TFB2M, yellow, based on ref. 19) and mitochondrial RNA polymerase (MTRPOL, in black19) at the light strand promoter (LSP) are indicated. (b) TFAM HMG1 is the first in contacting the DNA minor groove and induces a first kink to the DNA. (c) The linker segment contacts the DNA minor groove while adopting a helical conformation, which stabilizes the first DNA kink. (d) Binding of the linker leads domain HMG2 to bind the minor groove on the opposite side of the double helix, introducing a second DNA kink, causing a DNA U-turn that positions the TFAM C-terminal tail (gray coil) close to the 5′ end of the LSP. The red arrow indicates the hypothesis that the TFAM C terminus may interact with transcription factor B of the transcription machinery to initiate transcription.
90°
Experimental curve
Log I (s), relative
EOM selected
0 0.1 0.2 0.3
s (Å–1) Rg (Å)
0.4 0.5 15 25 35 45 55
Pool EOM selected
a
b
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.
thus conferring a high probability for it to contact DNA despite its low intrinsic affinity and induce a second bend to the DNA. By successive intercalation of residues into the DNA AAC motifs, the two HMG domains cooperatively induce an overall U-turn of ~180° stabilized by the linker (Fig. 5d). Importantly, we infer the key contribution of TFAM at LSP to be in bending the DNA and bringing the C-terminal tail close to the transcription initiation start site for TFB2M to enforce specific melting.
Another important activity of TFAM is DNA packaging, which was deduced from experiments in which the protein induced negative super- coiling in relaxed plasmids8,46. Supercoiling results from variation in twist (T, number of helical turns in the DNA double helix) and writhe (W, number of crosses over the double helix), whose sum give rise to the linking number (L). Our results show that, despite TFAM inducing a strong unwinding at two specific base-pair triplets, it results in an overall increment of twist of only 11 bp per DNA turn. Such a modest overall unwinding would be insufficient to explain the negative supercoiling observed8. Therefore, in agreement with the aforementioned studies, we posit that DNA supercoiling is due to an increase in writhe, resulting from an accumulation of sharp kinks generated by TFAM binding.
Several studies suggest that TFAM binds to LSP or to nonspecific sequences as a dimer and cooperatively7,29,32. Based on the crystal structure, a second molecule would not fit on LSP-22, even if it con- tained up to 30 bp. TFAM is in monomer-dimer equilibrium in solu- tion7,29,32, and cooperative binding on LSP is conceivable through protein-protein interactions stimulated by a previously formed 1:1 protein–LSP complex (for example, if the protein–DNA complex stabilized a protein-protein interaction surface). After dimerization, the additional HMGs would allow formation of DNA loops in addition to DNA bends, thus agreeing with atomic force microscopy studies that demonstrated the ability of TFAM cooperatively to introduce bends and loops into linearized and circular plasmids7. Cooperativity on large DNA molecules would arise from binding of successive mon- omers, generating a more favorable substrate for binding of the next protein by virtue of the structural distortions created on the DNA.
The proposed binding model accounts for the dual function of TFAM in transcriptional initiation and DNA compaction. Generation of highly bent structures like U-turns, within fragments as short as 22 bp, pro- vides a tool for the required compaction, as has been postulated for integration host factor in bacterial DNA structures such as the relaxo- some47. Binding of TFAM to mtDNA, initiated by HMG1, is progressive.
This raises the additional possibility that the linker and/or the HMG2 domain, together with the C-terminal tail, may participate elsewhere in the genome in the recruitment of other proteins, such as members of the MTERF family with otherwise nonspecific binding properties48. METHODS
Methods and any associated references are available in the online version of the paper at http://www.nature.com/nsmb/.
Accession codes. Protein Data Bank: coordinates and structure factors have been deposited for human TFAM–LSP-22Br with the accession code 3TQ6.
Note: Supplementary information is available on the Nature Structural & Molecular Biology website.
ACkNowledgMeNTS
We thank C. Silva and J. Colom for technical support. This study was supported by the Ministerio de Ciencia e Innovación (grants BFU2006-09593 to M.S., BFU2009-07134 to M.S., BFU2008-02372 to M.C., CSD2006-00023), Generalitat de Catalunya (SGR2009-1366 to M.S., SGR2009-1309 to M.C., SGR2009-1352
to P.B.), the European Union (FP7-HEALTH-2010-261460 to M.S., FP7- BioNMR-2010-261863 to P.B.), and Instituto de Salud Carlos III-FIS-PI 10/00662.
The Centro de Investigación Biomédica en Red de Enfermedades Raras is an initiative of the Instituto de Salud Carlos III. A.R.-C., J.F.S., N.J.-M. and P.F.-M. hold or held fellowships from Consejo Superior de Investigaciones Científicas, MICINN and Cusanswerk-Bischöfliche Studienförderung. H.T.J. is supported by Academy of Finland, Tampere University Hospital Medical Research Fund and Sigrid Juselius Foundation. We also thank the European Molecular Biology Laboratory (EMBL)- Grenoble and EMBL-Hamburg Outstations, the European Synchrotron Radiation Facility in Grenoble and the Automated Crystallography Platform (Barcelona Science Park) for their support.
AUTHoR CoNTRIBUTIoNS
A.R.-C. and J.F.S. contributed to cloning, protein production and crystallization;
A.R.-C., N.J.-M. P.F.-M. and P.B. conducted the SAXS studies; A.R.-C. and M.S.
contributed to X-ray structure solution; A.R.-C. and M.S. contributed to figure preparation. Together with the rest of authors, M.C., J.M. and H.T.J. participated in manuscript writing, provision of materials and infrastructure, and discussion. M.S.
designed and supervised the project.
CoMPeTINg FINANCIAl INTeReSTS The authors declare no competing financial interests.
Published online at http://www.nature.com/nsmb/.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
1. Brandon, M.C. et al. MITOMAP: a human mitochondrial genome database—2004 update. Nucleic Acids Res. 33 (Database issue), D611–3 (2005).
2. Fisher, R.P., Topper, J.N. & Clayton, D.A. Promoter selection in human mitochondria involves binding of a transcription factor to orientation-independent upstream regulatory elements. Cell 50, 247–258 (1987).
3. Montoya, J., Gaines, G.L. & Attardi, G. The pattern of transcription of the human mitochondrial rRNA genes reveals two overlapping transcription units. Cell 34, 151–159 (1983).
4. Legros, F., Malka, F., Frachon, P., Lombes, A. & Rojo, M. Organization and dynamics of human mitochondrial DNA. J. Cell Sci. 117, 2653–2662 (2004).
5. Iborra, F.J., Kimura, H. & Cook, P.R. The functional organization of mitochondrial genomes in human cells. BMC Biol. 2, 9 (2004).
6. Bogenhagen, D.F., Rousseau, D. & Burke, S. The layered structure of human mitochondrial DNA nucleoids. J. Biol. Chem. 283, 3665–3675 (2008).
7. Kaufman, B.A. et al. The mitochondrial transcription factor TFAM coordinates the assembly of multiple DNA molecules into nucleoid-like structures. Mol. Biol. Cell 18, 3225–3236 (2007).
8. Fisher, R.P., Lisowsky, T., Parisi, M.A. & Clayton, D.A. DNA wrapping and bending by a mitochondrial high mobility group-like transcriptional activator protein.
J. Biol. Chem. 267, 3358–3367 (1992).
9. Ekstrand, M.I. et al. Mitochondrial transcription factor A regulates mtDNA copy number in mammals. Hum. Mol. Genet. 13, 935–944 (2004).
10. Kanki, T. et al. Architectural role of mitochondrial transcription factor A in maintenance of human mitochondrial DNA. Mol. Cell. Biol. 24, 9823–9834 (2004).
11. Larsson, N.G. et al. Mitochondrial transcription factor A is necessary for mtDNA maintenance and embryogenesis in mice. Nat. Genet. 18, 231–236 (1998).
12. Takamatsu, C. et al. Regulation of mitochondrial D-loops by transcription factor A and single-stranded DNA-binding protein. EMBO Rep. 3, 451–456 (2002).
13. Alam, T.I. et al. Human mitochondrial DNA is packaged with TFAM. Nucleic Acids Res. 31, 1640–1645 (2003).
14. Ghivizzani, S.C., Madsen, C.S., Nelen, M.R., Ammini, C.V. & Hauswirth, W.W.
In organello footprint analysis of human mitochondrial DNA: human mitochondrial transcription factor A interactions at the origin of replication. Mol. Cell. Biol. 14, 7717–7730 (1994).
15. Ohno, T., Umeda, S., Hamasaki, N. & Kang, D. Binding of human mitochondrial transcription factor A, an HMG box protein, to a four-way DNA junction. Biochem.
Biophys. Res. Commun. 271, 492–498 (2000).
16. Yoshida, Y. et al. Human mitochondrial transcription factor A binds preferentially to oxidatively damaged DNA. Biochem. Biophys. Res. Commun. 295, 945–951 (2002).
17. Canugovi, C. et al. The mitochondrial transcription factor A functions in mitochondrial base excision repair. DNA Repair (Amst.) 9, 1080–1089 (2010).
18. Falkenberg, M. et al. Mitochondrial transcription factors B1 and B2 activate transcription of human mtDNA. Nat. Genet. 31, 289–294 (2002).
19. Sologub, M., Litonin, D., Anikin, M., Mustaev, A. & Temiakov, D. TFB2 is a transient component of the catalytic site of the human mitochondrial RNA polymerase.
Cell 139, 934–944 (2009).
20. Cotney, J. & Shadel, G.S. Evidence for an early gene duplication event in the evolution of the mitochondrial transcription factor B family and maintenance of rRNA methyltransferase activity in human mtTFB1 and mtTFB2. J. Mol. Evol. 63, 707–717 (2006).
© 2011 Nature America, Inc. All rights reserved.© 2011 Nature America, Inc. All rights reserved.