4.2 MÉTODO DE DISEÑO DE PAVIMENTO RÍGIDO DE LA PCA
4.2.2 Determinación De Factores De Diseño Del Método De La PCA
GUXl.TRIRE exoglucanase cellulose binding
HEMA_IANT6 hæmaglutinin
domain only deleted because it is
CBPY_YEAST carboxypeptidase Y
comprised of two chains, deleted because the
C07_HUMAN complement factor 7
connectivity is incorrect deleted because the
GHR HUMAN growth hormone receptor
connectivity is incomplete extracellular domain only
VSM2 TRYBB variant surface glycoprotein first domain only
API_ACHLY antiproteinase I deleted because of
ALBU BOVIN serum albumin
incomplete connectivity second domain only
GUX CELFI exoglucanase catalytic domain only
ALKl HUMAN antileukoproteinase first domain only
APOH_BOVIN P2 glycoprotein I fourth and fifth domains
only
extracellular domain only
first domain only
ATNB_CANFA Na'^'/K'*' ATPase Pi chain
lOVO MELGA ovomucoid
LMPl.HUMAN membrane glycoprotein I first luminal domain
only
fourth and fifth domains
PIGR.HUMAN polymeric immunoglobulin
receptor only
PRIO MESAU major prion protein core domain only
SFP1_B0VIN seminal plasma protein PDC- first domain only
TF_HUMAN
109
tissue factor extracellular domain
CD8_HUMAN T-cell surface glycoprotein
only
extracellular domain
CD8 a chain only
CD4_M0USE T-cell surface glycoprotein extracellular domain
CD4 only
T able 4.1: Adjustments to the data set resulting from examination of the original connectivity determination references. There were four deletions, two for incomplete connectivity, one for incorrect connectivity and one for multiple chains not declared as such in the SWISSPROT entry. Seventeen members were reduced to constituent distinct domain connectivities as indicated.
adjustm ents to the data set resulting from examination of the original connectivity determination papers. In general, although some sequences in the data set have sequence identity above the threshold (a) the disulphide connectivity is either unconserved or (b) the FASTA initn and
opt scores of the relevant sequence alignments are not significant.
No arb itrary lower bound for sequence length was used to distinguish proteins and peptides. The smallest peptides included in the data set are 12 residues long. The sm allest sequence w ith multiple disulphides is the 15-residue HST2_EC0LI, which has two.
The resu ltan t data set, termed the SWISSV25D set, is listed by SWISSPROT code in Appendix I with a brief description of the function of the protein. It contains 186 sequences between 12 and 4536 residues long (the two next longest sequences have 634 and 879 residues), 64 single disulphide and 122 multiple-disulphide. These sequences comprise 479 disulphide bridges. Thirty-three proteins, determined to <3 Â resolution and refined or determined by 2D NMR, and reported in SWISSPROT as deposited in the PDB (Bernstein, et al., 1977) form a subset of data, with 76 disulphides. Their PDB codes are given in Appendix I.
4.2.2: T erm in o lo g y a n d r e p r e se n ta tio n
Disulphide connectivities are comprised of disulphide cross-linked loops. The representation of disulphide connectivity as (a) a standard diagram and (b) as the equivalent graph-theoretic network is shown in fig. 4.2. There are three possible relationships th a t can occur between any two cross-linked loops (see chapter 2, fig. 2.7): (a) independence, where the two loops share no residues; (b) overlap, where the two loops share a number of
enclosure, where one loop is contained wholly within the other loop sequentially. Overlap and enclosure are collectively referred to as interference.
The cross-linked loop length (N) is defined as the separation in C(%- Ca virtual bonds between two paired half-cystines. That is, a cross-link between residues i and i+N forms a loop of N such bonds. Independent loops are those not involved in overlap and enclosure. Any other loop is defined as an interfering loop. Connectivities containing only independent loops are termed independent connectivities. The connectivity length (denoted M) is the span of sequence from the most N-terminal half-cystine in the connectivity to the most C-terminal (fig. 4.2). Disulphide-bridged loops are numbered according to the position in the sequence of the N- terminal half-cystine, as illustrated. The span of a connectivity is the number of interfering loops in the connectivity. A connectivity of maximum span has all loops interfering with each other. The illustrative connectivity in fig. 4.2 has span 3.
loop length (N) for disulphide bd
connectivity
length M disulphide
Figure 4.2: Representation of disulphide connectivity as a standard diagram and as a graph-theoretic network. The loop length (N) for
disulphide bd and the connectivity length M are indicated. The edge ab is in bold in both. Half-cystines are labelled a to f.
4.2.3: M od els fo r c o n n e c tiv ity c la s s ific a tio n
Cross-linkage theory is used to rank native connectivities, with two models for weighting connectivity arrangements, termed the diffusive and the entropie models. The diffusive model describes the probability of the native connectivity in term s of the probability of diffusive contact in a randomly coiled unfolded state, and was originated by Kauzmann (1955). The entropie model describes the probability of the native connectivity as arising from its contribution to the stability of the folded state from the entropie effect of cross-linkage (as discussed in chapter 2). These mutually inverse models are described below in detail (sections 4.2.3.2 and 4.2.3.3).
4,2.3,1: G e n e ra tio n o f a l l p o s s ib le c o n n e c tiv ity a r r a n g e m e n ts
There is a set of positions for half-cystines for each protein sequence. Each set is rearranged computationally to generate all possible connectivity arrangem ents. For any particular set of half-cystines (number =2x), the number of possible connectivity arrangements (Cx) of x cross-links is given by:
Cx = (2x)!/[2X(x!)l (4.1)
Although not considered here, for an odd num ber of such residues, (2x+l), the num ber of possible connectivities is given by Cx+ 1 (free cysteine is so infrequent in the present data set th a t it is ignored in this investigation). For each arrangement a weighting is calculated according
ranked in a list along with the other possible arrangements, rank 1 for the least diffusively probable or most entropically stabilising, and so on. This ranking characterises the native connectivity. The procedure is illustrated in fig. 4.3.
4.2,3,2: M o d el 1, th e d iffu s iv e w e ig h tin g
This model is based on th at originally derived by Kauzmann (1955). He characterised the conformations of polypeptides containing cystine by calculating relative probabilities for the diffusive formation of the different possible arrangem ents during folding. The probability of a connectivity arises from the likelihood of diffusive contact in the unfolded state. Half cystines generally pair w ith more-adjacent residues w ith a higher probability. Connectivities which can be separated into further smaller connectivities and/or independent loops would tend to be more heavily weighted.
For example, for two-disulphide connectivities, the overlapping
arrangem ent is always the least probable, regardless of the positions of the residues along the sequence. Of the other two possible arrangements,
enclosed and independent, either can be the most probable depending on the ratios of the length of the three sequence segments between the half cystines.
The multiple-loop probability (here denoted as P^d for the ith arrangement) given by equation 2 . 2 0 is used to calculate the weighting for each possible arrangement. This weighting (Wi^d) is given by:
(a)