Basic Maths for Protein Crystallographers | |

Phasing |

An X-ray experiment allows us to measure all I(**h**) to some resolution
limit. If we knew both |F(**h**)| and
(**h**)
then we could generate a map of the unit cell having peaks at and only at each atom position using
the Fourier summation

This (the presence of peaks) is the fundamental property of crystal diffraction which underpins all structure solution methods.

But the phases cannot be measured directly and have to be inferred from differences between sets of intensity measurements. The experimental techniques to find them are loosely labelled as MIR, MIRAS, SIR, SIRAS, MAD, SAD:

- SIR
- single isomorphous replacement. Measurements are taken from a "native" protein and one "derivative" where some additional atoms have been incorporated into the lattice.
- MIR
- multiple isomorphous replacement. Measurements are taken from a "native" protein and several derivatives.
- SIRAS
- single isomorphous replacement plus anomalous differences. As above but the anomalous
measurements for F(
**h**) and F(-**h**) for the derivative are used. - MIRAS
- multiple isomorphous replacement plus anomalous differences.
- SAD
- single anomalous dispersion. The "native" crystal contains some atoms which scatter anomalously, and these differences are used in a similar way to the SIR treatment.
- MAD
- multiple anomalous dispersion.

We need to consider the structure factor equation in more detail before discussing these.

In fact the scattering factor f(i,**S**) is:

f(i,S) + f'(i) + i f"(i)

where f' and f" describe the scattering from inner electron shells, which varies as a function of
the wavelength, but is more or less constant at all
resolutions (*i.e.* f"(i,**S**) = f"(i)).
For many elements ( C, N, O in particular) f' and f" are very small at all
accessible wavelengths. Others, such as S and Cl have a small but detectable component at
CuK (f" ~ 0.5).
In general transition elements such as Se, Br have observable f" (f" ~ 3-4)
at short wavelengths. Metals and other heavy elements such as Hg, Pt, I *etc.* have quite
large f" and f' contributions at most accessible wavelengths (at
CuK f"_{Hg}
~ 8).

It helps to re-write the F_{H}(**h**) or F_{A}(**h**) component
like this:

The anomalous contribution is always 90 degrees in ADVANCE of the real contribution.
The ratio of all |F"_{H}|/ |F_{H}| = f"(j,h)/{f(j,h) -f'(j,h)}.

Now

which means that, although the magnitudes of F_{H}(**h**) and F_{H}(-**h**)
are equal, their phases are different, and F_{H}(-**h**) is no longer the complex conjugate of
F_{H}(**h**).

And since F_{PH}(**h**) = F_{H}(**h**) + F_{P}(**h**),
and F_{PH}(-**h**) = F_{H}(-**h**) + F_{P}(-**h**)
it follows that neither the magnitudes of |F_{PH}(**h**)| and |F_{PH}(-**h**)| are
equal,
nor the phase _{PH}(**h**)
equal to -_{PH}(-**h**).

Answer: In no way unless we can position the heavy (or anomalous) atoms.

If they are known, the vector F_{H}(**h**) can be calculated and from the knowledge
of the three magnitudes |F_{H}(**h**)|, |F_{P}(**h**)| and
|F_{PH}(**h**)| plus the phase of F_{H}(**h**), it is
easy to show from a phase triangle that
_{P} will have
to equal _{H}±_{diff}.

This is often represented with "phase circles" (or phasing diagrams) or "phase triangles": |

Since there are usually only a few heavy atoms associated with many protein atoms, they
can usually be positioned using Pattersons or direct methods. Both these techniques require
only an estimate of the **magnitude** of the F_{H}(**h**).

It maybe is worth summarising here the theory behind difference Pattersons.

F_{PH}(**h**) = F_{H}(**h**) + F_{P}(**h**).

The cos rule gives:
|F_{PH}(**h**)|² =
|F_{H}(**h**)|² + |F_{P}(**h**)|² +
2 |F_{H}(**h**)| |F_{P}(**h**)|
cos_{diff}

where _{diff} is the phase between
vector F_{H}(**h**) and vector F_{P}(**h**).
From this we can approximate:

|F_{PH}(h)| = {|F_{H}(h|² + |F_{P}(h)|² + 2 |F_{H}(h)| |F_{P}(h)| cos_{diff}}^{½}=

|F_{P}(h)| {1 + 2 |F_{H}(h)|/ |F_{P}(h)| cos_{diff}+ (|F_{H}(h)|/ |F_{P}(h)|)²}^{½}

The binomial theorem gives (1+x)^{½} ~ 1 + x/2 when x is small, so

|F_{PH}(h)| ~ |F_{P}(h)| {1 + |F_{H}(h)|/ |F_{P}(h)| cos_{diff}+ ½(|F_{H}(h)|/ |F_{P}(h)|)²}

= |F_{P}(h)| + |F_{H}(h)| cos_{diff}+ ½|F_{H}(h)|²/ |F_{P}(h)|

So |F_{PH}(**h**)| - |F_{P}(**h**)| ~
|F_{H}(**h**)| cos_{diff} +
an even smaller term, providing |F_{H}(**h**)| is small compared to
|F_{P}(**h**)|.

and a Patterson with coefficients
(|F_{PH}(**h**)| - |F_{P}(**h**)|)² is approximately equivalent to one
with coefficients (|F_{H}(**h**)|
cos_{diff})² =
½|F_{H}(**h**)|²
(1 + cos 2_{diff})
(remember: cos²(x) = (1+cos(2x))/2)

The summation of ½|F_{H}(**h**)|²
will give the normal Patterson distribution of vectors between related atoms
while the summation of ½|F_{H}(**h**)|²
cos 2_{diff} will generate only noise.

Similar equations explain why a Fourier summation gives full weight peaks at the atomic positions
which have been included in the phasing, and peaks at about half the expected height for atoms excluded
from the phasing. Say F_{PH}(**h**) =
F_{P}(**h**) + F_{H}(**h**) where
F_{H} is much smaller than F_{P}; *i.e.* only a few atoms are excluded from the
phasing. Then as above

|F_{PH}| ~ |F_{P}| + |F_{H}| cos(_{P}-_{H}) + small terms

The Fourier summation

|F_{PH}| e^{iP}= |F_{P}| e^{iP}+ |F_{H}| cos(_{P}-_{H}) e^{iP}

Since cos(x) = (e^{ix} + e ^{-ix})/2

cos(_{P}-_{H}) e^{iP}= ½ (e^{iH}+ e^{i(2P-H)})

and the second term becomes

F_{H}½(e^{iH}+ e^{i(2P-H)})

giving the Fourier map for the atoms contributing to F_{H} at half weight, plus noise, since
the phase 2_{P}-_{H} is not related
to these atoms at all.