Search
of the optimal strategy for refinement of atomic models
P. Afonine^{§,}^{*}, V.Y. Lunin^{#,}^{*} & A. Urzhumtsev^{*}
^{$} Centre Charles Hermite, LORIA, VillerslèsNancy, 54602 France
^{#} IMPB, Russian Academy of Sciences, Pushchino, 142290, Moscow Region,
Russia
^{*} LCM3B, UPRESA 7036 CNRS, Université Henri Poincaré, Nancy 1, B.P. 239,
Faculté des Sciences, VandoeuvrelèsNancy, 54506 France
email: afonine@lcm3b.uhpnancy.fr
Recently it has been shown (Afonine et al, 2001; Lunin et al., 2002)
that the approximation of the maximum likelihood criterion (ML) by a quadratic
functional (Lunin & Urzhumtsev, 1999) allows to understand the features of
the ML refinement and its advantages with respect to the traditional
leastsquares (LS) refinement. In this latter, the magnitudes _{} of structure factors
calculated from the current atomic model are fitted to the observed structure
factor magnitudes _{} by minimisation of
_{}, (1)
The weights_{} may reflect the accuracy of the observed magnitudes or some
other effects, but most frequently the unit weights are used.
In the procedure that
is usually referenced as MLrefinement the minimised criterion is the negative
logarithm of the likelihood, the modeldependent part of which may be presented
as
_{} (2)
with
_{} . (3)
For every
reflection, its parameter _{} depends on the reflection indexes and
particular space group and the statistical parameters _{} and _{},
being the functions of the resolution, reflect the precision of the atomic
parameters and the completeness of the model (see for example Lunin &
Urzhumtsev, 1984; Read, 1986; Lunin & Skovoroda, 1997; Skovoroda &
Lunin, 2000).
The
approximation of the criterion (2,3) by a quadratic functional means its
substitution by a functional
_{} . (4)
where the target values_{} are no longer the observed magnitudes and the nonunit
weights _{} are crucial for a
successful refinement. The minimisation of this function we will call
LS*refinement.
Previously (Lunin et
al., 2002) we have discussed that _{} and _{} in (4) may be
represented as
_{} , _{}, (5)
where m(p) and n(p) are some functions defined in Lunin et al. (2002) and whose behaviour
explains the features of the ML refinement.
Formula (5) shows that the parameters _{} and _{}, play the key role in the estimation of _{} and _{} and therefore in the
whole refinement. In this article we discuss the best choice of _{} and _{}.
2.
Estimation of _{}and _{}
Several approaches can be suggested to
estimate the parameters _{} and _{}. If there exists some probabilistic hypothesis about
irremovable errors in the atomic model (for example, about a missing part of
the model) then for several particular cases these parameters may be calculated
explicitly (Urzhumtsev et al., 1996). In particular, in the case of an
incomplete model, if the absent atoms are supposed to be distributed uniformly
in the unit cell, these parameters may be calculated as
_{} and _{}, (6)
where f_{k}(s) are
atomic scattering factors of the absent atoms. It should be noted that in
practice the exact number of missed atoms and their
scattering factors can be known only approximately (for example, it is
difficult to know the exact number of missed ordered solvent molecules).
Another
way is to use likelihoodbased estimates of these parameters when comparing the
observed structure factor magnitudes with the ones corresponding to a starting
atomic model (Lunin & Urzhumtsev, 1984; Read, 1986). It is important to
note that the test set reflections (Brünger, 1992) only should be used (Lunin
& Skovoroda, 1995; Skovoroda & Lunin, 2000). Eventually, these estimates can be recalculated
iteratively during refinement.
These
different ways to estimate _{} and _{} have been tested by comparison of LS, ML
and various LS*refinement approaches in order to suggest the best refinement
strategy.
3.
Models and programs used for tests
Similarly to the previous work (Afonine et al., 2001; Lunin et al., 2002), the tests were carried out with CNS complex (Brünger et al., 1998) using the model of Fab fragment of monoclonal antibody (Fokine et al., 2000) which consists of 439 amino acid residues and 213 water molecules, 3593 atoms in total. The crystal belongs to the space group P2_{1}2_{1}2_{1} with the unit cell parameters a = 72.24 Å, b = 72.01 Å, c = 86.99 Å, one molecule per asymmetric unit.
For test purposes the values of F_{obs} at 2.2 Å resolution were simulated by the corresponding values calculated from the complete exact model and were used for all refinements. The errors in the atomic coordinates were introduced randomly and independently. Incomplete models were obtained by random deletion of atoms, both from the macromolecule and from the solvent.
4.
Choice of _{} and _{}_{ }
Several refinement strategies based on different
choice of _{} and _{} through different
estimation of _{} and _{} have been compared.
First of all, the parameters _{} and _{} have
been calculated using the technique described previously (Lunin &
Skovoroda, 1995; Skovoroda & Lunin, 2000) through the comparison of the _{} magnitudes with the
structure factors _{} calculated from the
starting model. These values were kept for the whole refinement process
consisted of 800 cycles.
Secondly, the same method of the estimation of _{} and _{} has
been applied but their values were recalculated every 400 or 200 refinement
cycles, depending on the test.
Alternatively, the refinement was carried out using
the estimations (6). In these tests the exact number of missed atoms and their
scattering factors were supposed to be known.
Finally, the refinement was carried out with the mixed
parameter values, _{} = 1 for all reflection as in (6) and _{} estimated from the comparison of _{} with _{} .
The start models with the mean coordinate errors of 0.5 and 0.7 Å respectively and with 0.5% and 3.0% of incompleteness were optimised using LS*criterion (4). For comparison, corresponding LS and MLrefinements were also done. The results of these tests are shown in Table 1. It can be remarked that, as it has been discussed (Afonine et al., 2001; Lunin et al., 2002), even a small quantity of absent atoms can already strongly influence on the quality of the refined model.
Table 1. Mean coordinate errors in the
model after refinement using different criteria. Starting models have mean
coordinate errors of D_{st }. The incompleteness D_{abs} of the models of 0.5% and 3.0% correspond to 18 and
108 atoms deleted, respectively. The number of cycles indicates the frequency
with which the parameters of the corresponding criterion were recalculated (the
frequency of parameters updating is not definitely known for ML). a_{F} and b_{F} stand for the parameters estimated from
the magnitude comparison and b_{C} stands for values calculated from (6). The
final coordinate errors shown in italic indicate the cases where this error is
higher than the starting error. The numbers in bold indicate the best
refinement protocol for the given model.
criterion 

LS* a_{F},b_{F} 


LS* a=1,b_{F} 

LS* a=1,b_{C} 
LS 
ML 
No of cycles 
1*800 
2*400 
4*200 
1*800 
2*400 
4*200 
1*800 
1*800 
800*1? 
D_{st }D_{abs} 



final 
error 




0.5Å 0.5% 
0.320 
0.140 
0.103 
0.358 
0.156 
0.127 
0.111 
0.212 
0.108 
3.0% 
0.453 
0.345 
0.397 
0.475 
0.353 
0.311 
0.247 
0.375 
0.305 
0.7Å 0.5% 
0.784 
0.636 
0.468 
0.633 
0.491 
0.388 
0.284 
0.397 
0.353 
3.0% 
0.803 
0.711 
0.592 
0.700 
0.599 
0.527 
0.404 
0.530 
0.537 
5. Influence of errors in
the estimation of _{}
Table 2. Mean coordinate errors after LS*refinement with the estimations (6) for different type assigned to missed atoms; CNO stands for the exact (mixed) type of atoms. Starting models have mean coordinate errors of D_{st }(in Å). D_{abs} is incompleteness of the models in percents; the number in parenthesis is the corresponding number of deleted atoms. The final coordinate errors shown in italic indicate the cases where this error is higher than the starting error.
D_{st} 
D_{abs} Type 
0.5
(18) 
1.0
(36) 
3.0
(108) 
5.0
(180) 
7.0 (252) 
9.0
(325) 

CNO 
0.105 
0.133 
0.256 
0.343 
0.447 
0.513 
0.5 Å 
O 
0.111 
0.138 
0.247 
0.357 
0.450 
0.521 

C 
0.113 
0.136 
0.256 
0.343 
0.439 
0.499 

CNO 
0.289 
0.321 
0.422 
0.498 
0.579 
0.649 
0.7 Å 
O 
0.284 
0.278 
0.404 
0.468 
0.598 
0.645 

C 
0.285 
0.334 
0.425 
0.494 
0.609 
0.656 
Table 3. Mean coordinate errors for different values <B> of the mean temperature factor assigned to missed atoms. Starting models have mean coordinate errors of D_{st }(in Å). D_{abs} is incompleteness of the models; the number in parenthesis is the corresponding number of deleted atoms. The final coordinate errors shown in italic indicate the cases where this error is higher than the starting error.
<B>,
Å² 
D_{st }D_{abs}= 
0.5
(18) 
1.0
(36) 
3.0
(108) 
5.0
(180) 
7.0
(252) 
9.0
(325) 
5 

0.085 
0.129 
0.281 
0.386 
0.528 
0.582 
15 

0.083 
0.120 
0.259 
0.342 
0.455 
0.508 
25 
0.5 Å 
0.109 
0.144 
0.258 
0.347 
0.440 
0.506 
35 

0.144 
0.167 
0.272 
0.353 
0.449 
0.483 
45 

0.170 
0.207 
0.290 
0.380 
0.470 
0.507 
5 

0.178 
0.233 
0.378 
0.508 
0.626 
0.693 
15 

0.264 
0.274 
0.377 
0.474 
0.565 
0.610 
25 
0.7 Å 
0.304 
0.356 
0.431 
0.522 
0.605 
0.655 
35 

0.374 
0.432 
0.494 
0.595 
0.677 
0.703 
45 

0.517 
0.581 
0.599 
0.719 
0.774 
0.781 
To study the influence of the estimated temperature
factor on the minimisation process, the known values of Bfactors of missed
atoms (following the results of previous test, all these atoms were assigned to
be carbons) were considered to be equal to the same value which varied from 5
to 80 Å^{2} in a series of runs while the mean value of the temperature factor for
the deleted atoms varied in the limits 2729 Å^{2}. Table 3 shows that the variation of the
estimated temperature factors of missing atoms by ±15 Å^{2} around the mean values does not seriously affect the quality of the
refined model.
Finally, the influence of a wrong estimation of the
number of missed atoms has been studied. For this purpose the start model with
5.0% (180 atoms) of deleted atoms and introduced error of 0.5 Å was generated. Different estimations of the
number of missing atoms were used to get the _{} values
and corresponding _{} and _{}. The error in this number of order of at least 25%
practically did not influence the final coordinate errors.
6.
Conclusions
The quadratic approximation of the maximumlikelihoodbased criterion allows to understand better the features of the MLbased refinement and its advantages. Even more, this approximation allows to choose a better refinement strategy and to build its new quadratic functional the minimisation of which leads to better models that those obtained both by traditional LS and MLbased refinement.
In this quadratic functional, the corresponding target values _{} and the weights _{} are calculated using formulas (6) for the parameters _{} and _{} of the variable part of the likelihood function (2,3). These formulas allow to get such estimation for the ideally refined model without knowing directly its parameters and therefore to build the quadratic approximation of (2,3) at the point of its minimum and thus to improve the refinement criterion.
These estimations of _{} and _{} are quite insensible to the choice of the type of atoms supposed to be missed, to their mean Bfactor estimation and to the estimation of the number of such missed atoms making such new refinement strategy quite robust.
The work was
supported partially by RFBR grants 000448175 and 010790317, by CNRS, UHP
and Region Lorraine through financial support. The authors thank C. Lecomte and
E. Dodson for their interest to the project.
References
Afonine, P., Lunin,
V.Y. & Urzhumtsev, A.G. (2001). CCP4
Newsletter on Protein Crystallography, 39,
5256.
Brünger, A.T. (1992). Nature, 355, 472474.
Brünger, A.T., Adams,
P.D., Clore, G.M., DeLabo, W.L., Gros, P., GrosseKunstleve, R.W., Jiang,
J.S., Kuszewski, J., Nilges, M., Pannu, N.S., Read, R.J., Rice, L.M.,
Simonson, T. & Warren, G.L. (1998) Acta
Cryst. D54, 905921.
Fokine, A.V., Afonine, P.V., Mikhailova, I.Yu.,
Tsygannik, I.N., Mareeva, T.Yu., Nesmeyanov, V.A., Pangborn, W., Li, N., Duax,
W., Siszak, E., Pletnev, V.Z. (2000). Rus.
J. Bioorgan. Chem.,
26, 512519.
Lunin, V.Y., Afonine, P.V., Urzhumtsev, A. (2002). Acta Cryst., A, in press.
Lunin, V.Y. &
Skovoroda, T.P. (1995). Acta Cryst.,
A51, 880887.
Lunin, V.Y. & Urzhumtsev, A. (1984). Acta Cryst., A40, 269277
Lunin, V.Y. &
Urzhumtsev, A. (1999). CCP4 Newsletter on
Protein Crystallography, 37,
1428.
Skovoroda, T.P. & Lunin, V.Y. (2000).
Crystallography Reports 45, part. 2, 195198.
Urzhumtsev, A.G., Skovoroda, T.P. & Lunin,
V.Y (1996). J.Appl.Cryst., 29,
741744.