2.7 ALTERACIONES HIDROTERMALES
2.7.2 TIPOS DE ALTERACIONES HIDROTERMAL
The 597 bp ORF of C4^ encodes a 199 amino acid polypeptide (fig 11. B) with a deduced molecular weight of 2238 IDa. This calculated Mr is slightly higher than that determined for C4^ protein by SDS-PAGE (Shapland et al, 1988), but is in good agreement with the observation that C4^ h nnnroximately 0.5KDa less than transgelin
(22594KDa) in REF cells (Shapland et ai, 1988). The difference in calculated Mr of the cDNA derived C4^ polypeptide and that observed from cell extracts fractionated by SDS- PAGE may be due to N-terminal processing where the residues MANR could be removed (as discussed above, section C. 5) from the mature molecule in vivo. If these amino acids are removed from the calculation the Mr value becomes 21908Da and this is in reasonable agreement with the value of 21KDa determined by electrophoresis (Shapland et al, 1988). The difference in observed and calculated Mr of C4^ may also be due to the non-uniform migration, seen for many proteins, through SDS gels that is caused by incomplete dénaturation or inconsistent association of SDS with the polypeptide chain; for example, caldesmon migrates as a doublet with an apparent Mr of 140-142KDa on SDS gels, whereas cDNA sequencing gives molecular weights of 86.9- 88.7KDa (see Marston & Redwood, 1991 for review).
2. Isoelectric Point
Knowing the amino acid composition of a protein it is possible to estimate its isoelectric point (pi) by statistically averaging the pKa values of all the charged amino acids. By use of the equation:
i
p l = n {(X X p K r i ) + (X X p K r2) + ( X x p K r s )+....+(X x p K r O )
an estimate of the pi can be calculated. pKa is the dissociation constant of the side chain group for amino acids 1 to i, X is the number of such amino acids in the total proteins and n is the total number of amino acid types being considered.
Using this equation the pi of C4^ can be estimated at 8.09. This calculated value is higher than the value of 7.0 derived using non-equilibrating pH gradient gel electrophoresis (NEPHGE) (Shapland et al, 1988). This discrepancy may be due to the fact that under the experimental conditions used to purify protein C4 from REF cells (affinity'column purification on anti-C4 monoclonal conjugated to affigel) there is a much greater recovery of transgelin than C4^ (Shapland et al, 1988), and thus, upon NEPHGE analysis there is difficulty in observing a C4^ 'spot'. The pi of 8.09 is in absolute agreement with the observation that native C4^ is eluted from a pH 7-9.6 chromatofocusing column at a pH of 8.0 (fig 5).
3. Charge Distribution
The C4^ polypeptide contains twelve positively charged lysine (K) residues, eleven positively charged arginine (R) residues and zero histidine (H) residues; ten negatively charged aspartic acid (D) residues and eleven negatively charged glutamic acid (E) residues (see table 4). There are therefore 23 positive groups and 21 negative groups, giving a net charge of +2 at pH 7.0. These charged residues are evenly
distributed throughout the molecule. There is no appearance of a cluster of five negatively charged groups or a cluster of five positively charged groups found within the transgelin molecule (Prinjha et al, 1994). A positively charged cluster may be of significance, as such a region has been implicated in the binding of molecules to actin filaments (Shapland et al, 1993; Muhlrad, 1991).
4. Cysteine Distribution
indicates an absence of any intra-chain disulphide bonds (Shapland et al, 1988). Upon sequencing of C4^ cDNA the derived polypeptide can be seen to contain three cysteine residues at positions 38, 63 and 124 (fig 11 B) and this would therefore possibly indicate the potential for C4^ to form one intramolecular disulphide bond, or intermolecular disulphide bonds resulting in the formation of cross-linked monomers, dimers or oligomers (see Bardwell & Bekwith, 1993). The presence of a free cysteine within C4^ raises the possibility that it could bind to the actin cytoskeleton as a higher order structure (eg. a dimer or trimer) such as dematin (Rana et al, 1993). Free cysteine residues within C4^ may also form a disulphide bond directly with a free cysteine within actin (there are six cysteine residues in rat y-actin (see Sheterline & Sparrow, 1994)); such disulphide cross-linking has been observed between caldesmon (contains two cysteine residues) and actin in vitro (Graceffa & Jancso, 1991). However, a disulphide cross-link is unlikely to be a major contributor to the energy of association between C4^ and actin since REF C4^ is completely soluble in 0.1% Triton-XlOO (Shapland et al, 1988) in the absence of reducing agents.
5. Hydropathy Plot
Measuring the hydropathy of a molecule allows one to identify the polarity of different regions, ie. the hydrophilicity and the hydrophobicity. The most stable state of a protein folding in water is when the maximum number of polar amino acid groups are orientated to the surface of the molecule, whilst the maximum number of non-polar side chains are folded within the molecule away from the surface (Doolittle, 1986). Hydropathic values at each amino acid are calculated by statistically averaging the index values for six adjacent amino acids (values from the PEP program Intelligenetics). This is carried out by moving along one amino acid at a time until overlapping values are obtained for the entire protein and can then be displayed graphically (Kyte & Doolittle,
1982). The hydropathy plot for C4^ is presented in figure 17. By use of this plot, regions of interest can be examined, for example, areas of the protein that may be involved in interactions with other proteins should be present in hydrophilic stretches.
The plot representing C4^ displays no highly hydrophobic regions that are characteristic of transmembrane proteins. Proteins that are transported across plasma membranes from the cell contains, in many cases, a highly hydrophobic N-terminal leader sequence, that is proteolytically cleaved during or after transportation (Alberts et al, 1994). For example, the Ca^+-activated, F-actin severing protein, brevin (plasma gelsolin) possesses a hydrophobic N-terminal leader sequence. This allows it to be secreted into blood plasma and circulate around the body, possibly severing actin filaments that have been released from damaged cells (Kwiatkowski et al, 1986). C4^ possesses a hydrophilic N-terminal region that would indicate a non-secreted, nor a membrane associated protein. This prediction is in complete agreement with the non- membranous distribution of protein C4 seen in mesenchymal cells (Shapland et al, 1988) and C4^ in SV40 transformed 3T3 cells (this study: fig 1).
A possible actin-binding domain (residues 96-103) for C4^ (discussed below, section E. 1) exists within a hydrophilic region, and may therefore be ’available’ for interactions with actin. The three cysteine residues of C4^ (residues 38, 63 and 124) all lie within hydrophobic regions of the molecule and would therefore not be present at the surface of the folded protein. This seems lo correlate with the hypothesis that C4^ does not interact with other proteins, namely the actin cytoskeleton, via disulphide cross-links (discussed above, section D. 4). Within the actin cross-linking protein dematin, the free cysteine lies within a hydrophilic region and is therefore ’available’, and is indeed known, to form disulphide cross-linkages creating a trimeric structure (Rana et al, 1993). A putative nuclear localisation signal (residues 152-158), a basic sequence that allows targeting to the nucleus (discussed below, section E. 4), lies within the most hydrophilic region of C4^ indicating an orientation on the surface of the folded molecule.
The potential phosphorylation sites (residues 160-163 and 180-185; discussed below, section E. 2) exist within the highly hydrophilic C-terminal region of C4^ that would be accessible to kinase enzymes. Phosphorylation of these sites may alter the hydropathy of the region and induce conformational changes in the surrounding area, potentially in the putative nuclear localisation signal (152-158), modulating the ability of C4^ to target to the nucleus. Indeed, in the case of the ABP cofilin, a nuclear localisation
signal becomes 'unmasked' due to dephosphorylation of adjacent sites that causes a conformational change (Ohta et al, 1989). Conformational change due to protein phosphorylation has also been seen to modulate the ability of a number of ABPs to interact with actin (eg. MARCKS (Hartwig et al, 1992)).
6. Secondary Structure Prediction
Many thousands of protein sequences are known and this wealth of biological information has highlighted the need for accurate and automatic methods to predict protein conformation and function from primary structure. A prediction of the secondary structure of C4^ was carried out using a multiprediction program (Zvelebil et al, 1987) by Dr. M. Zvelebil at the Ludwig Institute for Cancer Research, London. The prediction utilises information available from a family of homologous sequences (see fig 19). The approach is based both on averaging the Gamier et al (1978) secondary structure propensities for aligned residues and on the observation that insertions and high sequence variability tend to occur in loop regions between secondary structures. Accordingly, an algorithm first aligns a family of sequences and a value for the extent of sequence conservation at each position is obtained. This value modifies a Gamier et al prediction on the averaged sequence to yield the predicted secondary stmcture for the molecule in question. However useful they may be in providing insights into the possible structure of globular proteins, secondary structure predictions are subject to significant errors and should be treated with caution (Kabsch & Sander, 1983).
The results of the predictive program for C4^ is presented in figure 18. The results show that the protein starts with a turn that leads into an a-helical region (amino acids 10-37) which turns into the first short (3-sheet region (amino acids 60-69). The central portion of the protein (amino acids 81-138) is essentially characterised by long regions of a-helix. A turn is followed by a further short a-helix (amino acids 139-164), further tums and short p-sheet regions (amino acids 172-177). The molecule ends in a P-sheet, followed by a turn. While p-sheets provide considerable structural stability, most interactions between proteins often occur by the co-alignment of a-helices (eg. see hisactophilin (Habazetti et al, 1992)). It may therefore be of significance that the potentially important region of C4^ that may be involved in the binding to actin (amino acids 96-103; discussed in section E. 1), occurs in a region likely to exist as a-helix.
E. C4l SEQUENCE MOTIFS