D. I. Svergun

European Molecular Biology Laboratory, Hamburg Outstation, EMBL c/o

DESY,
Notkestrasse 85, D-22603 Hamburg, Germany,

and Institute of Crystallography,
Russian Academy of Sciences, Leninsky pr. 59, 117333 Moscow, Russia.

E-mail:
Svergun@EMBL-Hamburg.DE

Information content in solution scattering data is usually estimated with the
Shannon sampling theorem (Shannon & Weaver, 1949). A scattering curve I(s)
is the Fourier image of the spherically averaged Patterson function of the
particle P(r)=<P(**r**)> which equals to zero beyond r=D_{max}, where
D_{max} is the maximum particle size. I(s) is therefore an analytical function.
The sampling theorem states that the number of parameters (Shannon channels)
required to represent an analytical function on an interval [s_{min}, s_{max}] is
equal to N_{s}= D_{max}(s_{max} - s_{min}) / . In practice, solution scattering curves decay
rapidly with s and they are normally recorded only at low (not better than 1
nm) resolution, so that the typical number of the Shannon channels does not
exceed 10 to 15.

In keeping with the low resolution of the solution scattering studies, the data
interpretation is usually performed in terms of homogeneous bodies. Homogeneous
approximation reduces the number of free parameters N_{p} tering is low, an* ab
initio* shape determination procedure should require as few parameters as
possible. Let us represent the particle envelope by a two dimensional angular
function r=F() describing the particle boundary in spherical coordinates (r,
). This function is conveniently parameterized as

(1)

where Y_{lm}() are spherical harmonics, the multipole coefficients f_{lm} are complex
numbers and the truncation value L defines the resolution of the
representation. The particle density distribution in homogeneous approximation
can be written as

(**r**) =
(2)

where is the width of the particle-solvent interface which for dissolved
macromolecules can be taken =0.3 nm to account for the first hydration shell.
The particle envelope is thus represented by (L+1)^{2} numbers f_{lm} at a spatial
resolution rR_{0}/(L+1), where R_{0} is the radius of the equivalent sphere.

Solution scattering intensity is I(s) = <I(**s**)>_{} = <{**F**
[(**r**)]}^{2}>_{}, where **F** denotes the Fourier transform, <>_{}
stands for the average over the solid angle in reciprocal space, and
**s**=(s, ) is the scattering vector. Expanding (**r**) in spherical
harmonics

(3)

the scattering intensity is expressed as (Stuhrmann, 1970a)

(4)

where the partial amplitudes A_{lm}(s) are the Hankel transforms from the radial
functions

(5)

and j_{l}(sr) are the spherical Bessel functions.

Inserting (2-3) into (5) and using the power series expansion for j_{l}(sr) a
closed expression for the partial amplitudes *via* the f_{lm} coefficients is
obtained allowing one to rapidly evaluate the scattering intensity (4) from a
given envelope (Stuhrmann, 1970b; Svergun & Stuhrmann, 1991; Svergun,
1997). Using this approach, an algorithm for *ab initio* determination of
the low resolution envelopes of biopolymers in solution from their experimental
scattering curves is developed. Starting from a spherical shape (for which all
coefficients but f_{00} are equal to zero), the f_{lm} coefficients are obtained
which minimize the discrepancy between the experimental [I_{exp}(s_{k}), k=1,...N]
and calculated curves

(6)

with the weighting factor W(s_{k})= s_{k}^{2}[(s_{k})/Iexp(s_{k})], where (s_{k}) is the standard
deviation in the k-th point. Details of the shape determination algorithm are
presented elsewhere (Svergun *et al.*, 1996; 1997a).

A natural question arises whether the low resolution shape determination is
unique, in other words, whether, in addition to the trivial case of an
enantiomorphic envelope, different shapes exist at the same level of resolution
(i.e. at the same L) yielding identical scattering curves. This problem was
considered by Svergun *et al.* (1996) using computer simulations on model
bodies described by the envelope functions exactly represented by a finite
series (1) on spherical harmonics. Given the scattering intensity calculated
from a model envelope, the particle shape was restored from this intensity with
the above algorithm. Both error-free curves and those containing statistical
noise were simulated in different angular intervals.

The results indicated that the shape restoration for error-free data is unique,
even when using very limited ranges in the simulated curves. In the presence of
errors, ambiguity of the shape determination depends on the relation between
the number of model parameters N_{p} and that of the Shannon channels N_{s}. The
shape restoration was found to be practically independent of the initial
approximation and stable with respect to the random errors if N_{p}1.5 N_{s}.

Experimental solution scattering curves cover usually about 10 to 15 Shannon
channels thus allowing to use 15 to 20 variables in the shape description. The
number of independent parameters in series (1) is equal to N_{p}=(L+1)^{2}-6 (here,
the reduction by six variables is due to arbitrary rotations and displacements
of the particle which do not alter the scattering curve). It means that in
practice the multipole resolution up to L=4 can be used.

Practical implementation of the shape determination algorithm required several extensions to account for the deviations from the ideal model:

(i) When using raw X-ray scattering data, homogeneous approximation may not be
valid in the outer parts of the scattering curves where the scattering from the
inhomogeneities of the polypeptide chain can no longer be neglected, especially
for proteins of low (less than 20kDa) molecular mass. This effect is taken into
account as follows. From the inner part of the scattering curve (first three
Shannon channels), the best fit three-axial ellipsoid is found. Scattering from
the internal inhomogeneities I_{s}(s) inside the ellipsoidal envelope is evaluated
using the method of Svergun (1994), and this curve is subtracted from the
experimental data so that the difference I_{exp}(s)- I_{s}(s) at higher angles
follows the asymptotic behavior s^{-4} according to the Porod's law for
homogeneous particles (Feigin & Svergun, 1987).

(ii) The model envelope is represented by a finite set of harmonics, whereas
real particles would require the infinite series. To reduce the truncation
effect, the best fit ellipsoidal envelope is developed into spherical
harmonics, and its the shape representation (1) is truncated at the same L
value as that used in the shape determination (usually, L=4). The ratio
w(s)=I_{L}(s)/I_{el}(s) is calculated where I_{el}(s) is the scattering curve from the
ellipsoid, I_{L}(s) from its truncated representation. The experimental intensity
is then multiplied by this "ellipsoidal filter" w(s) and the resulting curve
J_{exp}(s)= w(s)[I_{exp}(s)- I_{s}(s)] enters the shape determination.

(iii) When minimizing functional (6), the calculated intensity I(s) at each function evaluation is multiplied by the scaling factor

(7)

which provides the currently best least squares fit to the experimental curve. The shape determination can therefore be directly applied to raw experimental data on a relative scale.

The *ab initio* shape determination program with the above extensions runs
on IBM-PC and on major UNIX platforms (Svergun *et al.*, 1997a). Its
implementation on a SUN Sparc-20ZX workstation is coupled with a
three-dimensional rendering program ASSA allowing the user to monitor the
process of the shape determination (Kozin, Volkov & Svergun, 1997).

The program has been tested on several proteins with known atomic structures
in the crystal (X-ray solution scattering patterns were collected as parts of
ongoing projects at the EMBL Outstation in Hamburg). Figs 1 and 2 present the
shape determination of two proteins, monomeric hexokinase and HIV-1 reverse
transcriptase (molecular masses 52 and 105 kDa, respectively). In both cases,
particle envelopes up to L=4 (19 free parameters) were directly restored from
the experimental data starting from a spherical initial approximation. The
envelopes are displayed in Fig. 2 along with the atomic structures of the
hexokinase (Bennett & Steitz, 1980), and of the reverse transcriptase (Wang
*et al.*, 1994) deposited in the Protein Data Bank (Bernstein *et
al.*, 1977), entries 1HKG and 3HVT, respectively). As the orientation of
the restored models is arbitrary, they and their enantiomorphs were rotated so
as to minimize the deviation

(8)

where F_{cryst}() is the envelope function evaluated for the atomic structure at
the same L using the program CRYSOL (Svergun, Barberato & Koch, 1995). As
seen from the comparison, the *ab initio* restoration provides an adequate
low resolution description of the protein envelopes.

The R_{} factors are equal to
0.20 and 0.22 for the hexokinase and for the reverse transcriptase,
respectively.

The shape determination program was also used to restore the envelopes of other
proteins with known atomic structures (lysozyme, ribonucleotide reductase,
pyruvate decarboxylase, enopyruvil transferase, *etc.*). In all these
cases the restored shapes agreed well with the atomic structures, with the R_{}
factors ranging from 0.10 to 0.25. Of course, the program is aimed at the shape
determination of the proteins with unknown atomic structure; the above tests
have been done to check the reliability of the method in real experiment.

Particle symmetry imposes restrictions on the multipole coefficients f_{lm} in
series (1) and the information about the symmetry, if available, can improve
the reliability of the *ab initio* shape restoration by reducing the
number of parameters to be determined. Consider, for example, a homodimeric
particle with a two fold symmetry axis along z. In this case, all f_{lm}
coefficients with odd m vanish, and the particle shape at L=4 is described by
12 independent parameters instead of 19 for a non-symmetric case.

The higher the symmetry, the more multipole coefficients can be omitted, and
this allows one to enhance the resolution of the restoration. Figs 3 and 4
present the shape determination of the homotetramer of pyruvate oxidase
(molecular mass 260 kDa) assuming the 222 point symmetry. The multipole
expansion up to L=6 for this symmetry group requires only 13 free parameters.
The restored envelope displays a good agreement (R_{}=0.15) with the crystal
structure (Muller & Schultz, 1993, PDB entry 1POW)

The quaternary structure of symmetric particles can also be restored in terms
of the envelope function of the asymmetric unit. Thus, scattering from a
symmetric homodimer is readily expressed *via* the shape of a monomer and
the distance d between the monomers. The shape determination is performed as
described above with a single additional parameter d. This approach has already
been successfully used in practice (Schmidt *et al.*, 1995; Svergun *et
al.*, 1997a).

*
*

The first question to address is why is it at all possible to restore the
three-dimensional envelope from a one-dimensional curve using more parameters
than predicted by the theory? The answer is that the estimate of N_{s} reflects
only one (and most often quoted) part of the sampling theorem. The other part
says that full information about the entire analytical function is contained in
any finite contiguous portion of it. An oversampled scattering curve measured
with the angular increment much smaller than the sampling distance /D_{max} can be
analytically extrapolated beyond the experimental range (so-called
superresolution). As experimental solution scattering curves are always heavily
oversampled, they are able to provide more parameters than N_{s}.

Limitations of the model (1) used to describe the particle envelope should be
mentioned. First, as F() is assumed to be single-valued, complicated
(*e.g.* U-like) shapes or those containing internal holes cannot be
exactly represented. Second, omission of the higher harmonics with l>L is
compensated in the fitting procedure by the artificial enhancement of the lower
ones. This effect is partially corrected by the above described ellipsoidal
filtering and thus produces only marginal distortions for globular particles
but can still be significant for anisometric structures because of a slow
convergence of series (1). Remaining deviations between the restored envelopes
and the crystal structures in Fig. 2 provide an idea on the magnitude of the
truncation effect (it is worth noting that both proteins are rather
anisometric, with the axial ratios of the approximating ellipsoid equal to 2.8
and 3.6 for the hexokinase and reverse transcriptase, respectively).

What is the relation between the solution scattering and crystallographic data?
The latter clearly contain more information and provide much higher resolution.
However, test runs of the shape determination using simulated reflections
instead of solution scattering curves encountered difficulties because of a
high multimodality of the goal function. The reason for the multimodality is
that the crystallographic data, contrary to the solution scattering curves, are
undersampled: separation between the reflections is twice the sampling distance
required to describe the three-dimensional scattering intensity as the Fourier
image of the density in the unit cell (*e.g.* Baker, Krukowski &
Agard, 1993). Solution scattering data provide therefore complementary
information and their use can improve the efficiency of *ab initio*
phasing procedures. Low resolution experimental envelopes can be positioned in
the crystal cell using molecular replacement and further refined against both
solution scattering and the crystallographic data.

Measurements in solution provide also a possibility to model the structure and structural transitions of complex macromolecules in solution by rigid body movements of their crystallographically known domains (subunits) so as to fit the experimental scattering from the complex (Svergun, 1991; 1994; 1997). Thus, in solution scattering study of the classical allosteric enzyme aspartate transcarbamylase (Svergun et al., 1997), the overall changes accompanying the T->R transition in solution were found to be about 50% larger than those in the crystal (Kantrowitz & Lipscomb, 1988). This approach is now being used in several ongoing projects at the EMBL Outstation in Hamburg to study multidomain proteins in solution.

Bennett, W.S. Jr. and Steits, T.A.

Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F. Jr., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi M.

Feigin, L.A. and Svergun, D.I..

Kantrowitz, E.R. and Lipscomb, W.N.

Kozin, M.B., Volkov, V.V. and Svergun, D.I.

Muller, Y.A. and Schultz, G.E.

Shannon, C.E. and Weaver, W.

Stuhrmann, H.B.

Stuhrmann, H.B.

Schmidt, B., König, S., Svergun, D., Volkov, V., Fischer, G. and Koch, M.H.J.

Svergun, D.I.

Svergun, D.I.

Svergun, D.I.

Svergun, D.I. and Stuhrmann, H.B.

Svergun, D.I., Barberato, C. and Koch, M.H.J.

Svergun, D.I., Volkov V.V., Kozin M.B. and Stuhrmann H.B.

Svergun, D.I., Volkov V.V., Kozin M.B., Stuhrmann H.B., Barberato C. and Koch M.H.J (1997).

Svergun, D.I., Barberato, C., Koch, M.H.J., Fetler L. and Vachette P.

Wang, J., Smerdon, S.J., Jaeger, J., Kohlstaedt, L.A., Friedman, J., Rice, P.A. and Steitz T.A.