3. Referentes teóricos
3.10 Funciones de los miembros de la familia
In order to calculate its X-ray and neutron scattering curves, an atomic coordinate
model m ust first be converted to a sphere model consisting o f many small, overlapping
atomic coordinate w ithin a three dimensional array o f cubes and every cube that contains
a m inim um num ber o f atoms has a sphere, o f volum e equal to the cube, placed at its
centre. The m inim um number o f atoms required to assign a sphere is tested empirically
by trial and error against the length o f the cube side so that the resulting sphere model
has a volum e close to the unhydrated volume calculated for the protein. As a rule, the
unhydrated volume is calculated from the actual protein am ino acid com position rather
than from the model composition, in order to compensate for any discrepancies between
these two values. Such discrepancies may occur if the crystal structure is incomplete
due to disorder, or if homologous domain structures are used in the model as opposed
to the actual dom ain structure, or if linker regions betw een domains are omitted. The
length o f the cube side that is used to produce an unhydrated sphere model is typically
about
0.6
nm, which is much less than the maximum resolution o f a normal scatteringcurve. The resolution d is calculated from In!Q (w hich is derived from 'k = 2d sin 0 and
Q = 4n sin 0/ ^). For a scattering curve with a m axim um g o f 2.0 nm ’’ the maximum structural resolution is 3.1 nm. Once a grid size has been determ ined, this is kept fixed
during an autom ated curve fit search. During a search, it is usually necessary to fix the position o f the origin o f the grid in order to ensure consistency o f the grid conversion o f
coordinates into cubes. The dry models do not have a hydration shell and are used for
neutron curve m odelling as neutron scattering observes unhydrated glycoprotein
structures (Ashton et a l, 1997; Smith et a l, 1990; Perkins et ûj/., 1993). X-ray curve modelling requires hydrated structures, and the dry volum e was increased to allow for
a hydration shell. This shell is w ell-represented by 0.3 g o f H jO /g glycoprotein and an
electrostricted volume o f 0.0245 nm^ per bound w ater m olecule corresponds to a water
m onolayer surrounding the protein surface (Perkins, 1986), the volum e o f a free water
molecule being 0.0299 n m \ The sim plest way to hydrate the cube m odels is to increase
the length o f the cube side to m atch the volume increase. This procedure is satisfactory
for globular proteins o f compact structure. H ow ever this will significantly distort the m acrom olecular structure if this contains a void space at its centre. In the case o f the
serum amyloid P component, an alternative algorithm HYPRO (A shton et a l, 1997)
was w ritten to add a layer o f hydration spheres evenly over the protein surface.
in order to reach the required hydrated volume (Perkins et a i, 1998). In this second method, spheres corresponding to water molecules are added evenly over the surface o f
the unhydrated sphere model, so that the desired hydration volume is achieved. The
method o f converting an atomic coordinate model into unhydrated and hydrated sphere models, followed by the calculation o f neutron and X-ray scattering curves was initially
tested using solution scattering data for p-tr>^psin and a,-antitrypsin, for which crystal
structures were known (Smith et al., 1990; Perkins et a i, 1993). Subsequently the
methodology has been evaluated more rigorously using the crystal structure for
pentameric serum amyloid P component (Ashton et al., 1997).
Synchrotron X-ray cameras utilise a pin-hole configuration that does not produce
geometrical distortion o f the beam, so a calculated X-ray curve does not have to be corrected in order for it to be compared to an experimental curve. Although neutron cameras such as those at the ILL and the RAL also use pin-hole geometries, their
dimensions are larger than X-ray cameras, and they also use longer wavelengths to maximise the available neutron flux, and for these reasons, instrumental corrections are applied to the model neutron scattering curves. For the neutron scattering cameras D 11 and D 17 at the ILL in Grenoble, a Gaussian function based on a 16% wavelength spread AX/k (full-width-half-maximum) at X o f 1.0 or 1.1 nm and a beam divergence A0 o f
0.016 radians as an empirical correction, has been applied to model neutron curves. The theoretical values o f AX/X are respectively 8% and 10% for these two cameras, while that
for AO depends on both the beam aperture (0.7 x 1.0 cm“) and the size o f the detector cells (1 cm-) and is approximately 0.01 radians. A re-evaluation o f AX/X for D17 data for serum amyloid P component gave 10% in good agreement with theoretical, although
AO was larger at 0.024 radians (Ashton et a l, 1997). The neutron fits deteriorate at large
Q and this may indicate a small residual flat background that arises from incoherent scatter from the protons in the protein. The values used to correct model neutron curves
for comparison to D17 data are also used for comparisons to D22 data. On LOQ, wavelengths in the range from 0.2 to 1.0 nm are used simultaneously (where time-of- flight techniques provide the necessary monochromatisation) and although this complicates the beam corrections, a Gaussian function as for D 17 data has been applied
to model neutron curves to obtain reasonable comparisons to the experimental data.
Values o f 16% for AX/X for a putative X o f 1.0 nm and 0.016 radians for AO were initially
used (Mayans et al., 1995), but a later re-evaluation showed that values o f 10% for AX/X
for a putative X o f 0.6 nm and 0.016 radians for AO gave the best agreem ent between the
model neutron curves and LOQ data (Ashton et al., 1997).
The procedure for calculating scattering curves from an atomic coordinate model
is im plem ented by a suite o f FORTRAN com puter program s (S. J. Perkins, unpublished
software) and these have been incorporated into automatic strategies for the generation
o f atomic coordinate models (Figure 3.2; Mayans et al., 1995; Beavil et al., 1995;
Perkins et al., 1998a). Nested loops w ithin INSIGHT II and associated programs
(Accelrys, San Diego, USA) Biosym Command Language (Figure 3.2) are easily set up
to generate hundreds or thousands o f models based on tw o or m ore degrees o f rotational
and/or translational freedom between the domains or subunits in question. Typically,
an initial model o f the protein is generated from models o f its constituent dom ain
structures. After consideration o f information relevant to the association o f domains within the protein structure, moveable protein fragments are defined so as to consider
the least flexible model. A moveable fragment could consist o f a small single domain
fold, but if domains are known to associate into w ell-defined structures, e.g. the
association o f serum amyloid P com ponent domains into a pentam eric ring structure
(Ashton et al., 1997), the moveable fragments may be m uch larger. A set o f axes is
defined for each moveable fragment, and a Biosym Com m and Language macro is written to systematically move the fragments, either by translations, rotations or to form
an assembly rendered by calling random linker structures, and thereby generate a series
o f models. Once trial curve fits indicated that analysis was possible, autom ated solution
scattering m odelling searches have been perform ed on a Silicon Graphics INDY
R5000SC W orkstation w ith a 150 M Hz processor, 160 M b o f m em ory and a 4 Gb hard
disk. R un tim es were up to several weeks, but the use o f an OCTANE reduces this to
two days.
Each model is evaluated using several criteria, (i) The creation o f models can
result in physically unreasonable steric overlap between the subunits, accordingly the
calculated from the composition and the model was retained if the total was within 95%
o f that expected, (ii) Next, models were retained if the m odelled Rq and R^g values were
within 5% from the experimental values, (iii) M odels were then assessed using a
goodness-of-fit R-factor:-
R = ~ X
100
% Eq. 3.4zk(2)„p|
The use o f /^-factor values is analogous to the evaluation o f crystallographic models, and
it has been successfully applied to evaluation purposes for solution scattering modelling
procedures (Smith et a!., 1990; Beavil et al., 1995). N ote that the R -factor will depend
on the Q range in use and the num ber o f data points in that Q range, and should be
norm alised against I(Q)cai for a given curve fitting exercise. For purpose o f automating
the curve fit procedure, the R-factor was initially used in the low Q range out to 0.5 nm'^
in order to determine the scaling factor to m atch the experim ental and calculated I(Q)
curves. N ote that this is the Q range used for Rq and Rxs determinations. To define a
working scale for curve comparisons, l(0)cai was arbitraily set as 1000. The quality o f
the curve fits from each model in the search was then determ ined by com puting the R-
factor for successive Q ranges out to 0.8-2.0 nm ’’ in 0.2 nm'* steps (denoted Rq g to R
2
o)-W hile R-factors are not comparable betw een different curve fitting exercises, and are
prim arily influenced by the large I(Q) values at low Q, they provide a useful filter o f
models. A full list is prepared o f each model including the geom etrical steps used to
define it, the number o f spheres in it, its Rq and R^g values, its Rq g to R
20
values, and thefrictional (F) and sedim entation coefficients (s°2o,w)- The list is imported into a the PC-
based spreadsheet, M icrosoft Excel which is used to set the cu t-o ff filters, sort the
models in order o f their R-factors, and identify the best curve fits (Figures 3.4, 3.5 and
Table 3.1).
If carbohydrate is present, the oligosaccharide chains were represented by a
suitable structure adapted from the Protein D ata Bank (Boehm et al., 1996) and added
to A sn residues on the protein surface. For the analyses o f single m ultidom ain proteins,
the dom ains were constrained in their relative positions by reasonable stereochemical