by Alexandre G. URZHUMTSEVa* and Vladimir Y. LUNINab
a LCM3B, UPRESA 7036 CNRS, Faculte des Sciences, Universite Henri Poincare, Nancy I, 54506 Vandoeuvre-les-Nancy, France
b Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
Recently, several groups (Murshudov et al., 1997; Templton, 1999; Tronrud, 1999) reported some advances in calculation of an approximate matrix of the second derivatives (Hessian or normal matrix) for crystallographic criteria which, in particular, is used in optimisation methods. In fact, the gradient methods need only the product of this matrix by a given vector which can be calculated rapidly and directly, without explicit matrix calculation (Lunin & Urzhumtsev, 1985). The second-order minimisation methods need an inverted Hessian matrix with an inversion being very sensible to approximations; as a consequence, algorithms for the calculation of the exact Hessian matrix [H] must be developed in this case. Moreover, it is important to know to calculate such matrix for combined minimisation criteria, with a mixture of different structure-factor depending criteria (crystallographic criterion in what follows), geometric criteria and others. This knowledge can influence the choice of the optimisation methods which vary in computation time per iteration, in the number of iterations necessary to arrive to a similar result, and in radius of convergence.
The direct calculation of the normal matrix of a crystallographic criterion needs of order of MN² operations (derivative of the contribution of every reflection with respect to every couple of parameters). Here N is the number of atomic parameters, M is the number of structure factors. In order to gain CPU time, some authors use a sparse matrix (Sussman et al., 1977; Hendrickson & Konnert, 1980; Dodson, 1981; Guillot et al., 2001). To estimate the full matrix of the second derivatives for the weighted least-squares fit of the magnitudes Templton (1999) suggested a statistical approach. Murshudov et al. (1997) and Tronrud (1999) developed the FFT technique suggested for this goal by Agarwal (1978). In particular, in the case of atomic parameters q as independent variables, Tronrud (1999) proposed a practical algorithm which needs
THT = C1N² + C2M ln M (1)
operations to calculate the principal part of the matrix [H]
traditionally neglecting the second term
Similar formulae were derived by Murshudov et al. (1997) for the case of the maximum likelihood criterion. At the same time, the use of Fast Differentiation Algorithm (Baur & Strassen, 1983; Kim et al., 1984; for crystallographic applications see Lunin & Urzhumtsev, 1985) allows to calculate rapidly the exact matrix of the second derivatives with respect to atomic parameters for both these criteria and to generalise this result for other cases.
Traditionally, it is considered that the gradient methods are much more time consuming than the methods without derivatives, and that generally every computation of a gradient needs about N times CPU more than a single value of the function, N being the number of variables. However, several groups (e.g., Baur & Strassen, 1983; Kim et al., 1984) have shown the following result and its conclusions:
For any function of N variables, calculation of a single value of which takes the time T, an algorithm exists to calculate its exact gradient faster than in 4T independently on the number N of variables.
For any function of N variables, calculation of a single value of which takes the time T, an algorithm exists to calculate its exact derivative along a given direction faster than in 4T independently on the number N of variables and without a gradient calculation.
For any function of N variables, calculation of a single value of which takes the time T, an algorithm exists to calculate the exact product of the matrix of the second derivatives by a given direction faster than in 20T independently on the number N of variables.
In practice, when some common expressions can be saved in memory, the time to calculate all 4 objects, namely the function f itself, its gradient Ñf, the function derivative along a given direction s and the product [H]s of the Hessian matrix [H] by the same direction, can be estimated rather by 4T where T stands for the time of a single calculation of the function value (Urzhumtsev et al., 1989).
FDA shows that the crucial point in the fast optimisation of crystallographic functional is a fast calculation of structure factors. A fast way of their calculations (Sayre, 1951; Ten Eyck, 1977) needs about
Tf = C1N + C2M lnM (4)
operations (C1 and C2 are some constants), instead of CNM operations for a direct calculation.
FDA gives the same amount of operations for the gradient calculation being applied to a least-squares criterion depending on atomic parameters (Lunin, 1978, unpublished; Lifchitz, in Agarwal, 1981) or to other crystallographic criteria (Lunin & Urzhumtsev, 1985; Urzhumtsev et al, 1989).
Moreover, because the n-th column of the matrix [H] of the second derivatives represents the product of this matrix by the direction (0, 0, … , 0, 1, 0, …, 0) where 1 is in the n-th position, the whole exact matrix of the second derivatives composed from N columns can be calculated by
THE = C1N² + C2 NM lnM (5)
operations. However, a faster and more general way to calculate the exact Hessian matrix can be suggested.
Let a function R(y1(x1, …, xN), …, yM(x1, …, xN)) depend on M variables y1, …, yM with every ym depending in turn on N variables x1, …, xN . The chain rule formula:
can be rewritten as
Here is a tensor composed of M matrices , , ... . If the gradient [ÑyR] is supposed to be known (the calculations are carried out accordingly to the FDA) then the operations needed to get are the operations to calculate the matrix products in (7) plus those to calculate
Formula (7) can be simplified for a number of special cases.
Special case 1: If the criterion R is additive, i.e. can be presented by a sum of individual contributions, not necessarily quadratic, from the components of the vectors y
then the matrix becomes diagonal. An example is the least-squares fit of the calculated data to the experimental ones.
Special case 2: When the variables y depend linearly on the variables x , y = [A] x , the second term in the formula is absent and (7) becomes
An example of such linear dependence is the relation between the electron density and its structure factors.
Special case 3: Quite often variables x contribute to variables y locally, i.e. each of x1, x2, ... xN contribute only to a small number Cy of variables yk, Cy << M . Complementary, every yk may depend only on a small number of Cx parameters xj, Cx << N . The matrix [dy / dx] becomes sparse giving an essential reduction in the number of calculations. An example of such dependence is the calculation of any field (e.g., an electron density) from an atomic model where atoms have a limited radius of contribution and are separated each from others.
It is easy to see that the calculation of normal matrix for crystallographic criteria can profit the features of all particular cases discussed above and it can be shown that the total amount of operations necessary to calculate the exact matrix of the second derivatives of the crystallographic least-squares criterion with respect to the atomic parameters is
THC = C1M + C2M lnM + C3N² ~ C12 M lnM + C3N² (10)
where C1, C2, C3 and C12 are some constants which do not depend neither on the number of structure factors M nor on the number of atoms N. This estimate is the same as (1) for an approximate matrix where the second-order terms are neglected from the beginning and is better than (5) which is obtained for the exact matrix by simple N-times calculation of the product of a matrix by a direction following a coordinate axis.
The same amount of operations is enough to calculate the matrix for the intensity least-squares criterion (Sheldrick & Schneider, 1997), for the phase criterion (Lunin & Urzhumtsev, 1985; Pannu et al., 1998), for the maximum likelihood criterion (Bricogne & Irwin, 1996; Pannu & Read, 1996; Read, 1997; Murshudov et al., 1997; Pannu et al.,1998).
In the general situation when the criterion cannot be represented by a sum of contributions from many small subsets of structure factors, the total number of computer operations becomes :
THG = C6M² lnM + C3N² (11)
If the independent parameters are not atomic ones (e.g., parameters for a rigid groups refinement), the matrix with respect to independent parameters can be calculated in the same way.
The formula (7) gives also an idea that for some special cases the inverted matrix of the second derivatives can be obtained directly without calculation the Hessian matrix itself. Indeed, if the transformation y(x) is linear, y = [A] x, then
If [A] corresponds to the Fourier transformation, the inverse operation is the inverse Fourier transform for which the matrix [A-1] can be written immediately. The matrix can be easily calculated for many crystallographic criteria, in particular for such important criteria as the least-squares or maximum likelihood functionals. Therefore, when the independent parameters are density values at the grid points the inverse Hessian matrix can be easily and directly calculated for these criteria as
where all elements of the matrix (13) are presented by the same function calculated as a Fourier series at different points r . In order to calculate this function at a grid compatible with the number of Fourier coefficients M, estimating K ~ M , the number of operations needed is about C2M lnM (Cooley & Tukey, 1965; Ten Eyck, 1973). This results shows that the minimisation methods of simple iteration, usually applied for density modification procedures, can be replaced not only by the gradient methods (Sayre, 1972, and Sayre & Toupin, 1975, for the particular Sayre criterion; Lunin, 1985, for the general case) but even by the methods of the second order. In this case the computational expenses are practically the same as those for the simple iteration methods.
In the case of the crystallographic least-squares or maximum likelihood refinement of atomic model, the exact matrix of second derivatives can be calculated by
THC = C12 M lnM + C3N² (14)
Operations where M is the number of structure factors, N is the number of atomic parameters and C12 and C3 are some constants which do not depend neither on M nor on N. This algorithm suggests step-by-step recalculation of the matrix with respect to variables of different levels of the molecular models (structure factors, density, atomic parameters etc.). The same iterative calculations are basic steps for the fast gradient calculation (Lunin & Urzhumtsev, 1985). It should be noted that such way of calculation allows easily to add the contribution from any other criteria of the same type or of any other type of models, e.g., phase criterion, stereochemical criteria, criteria depending on the electron density etc. Therefore, the formulae which give the expression of the gradient (or of the Hessian matrix) of a crystallographic criterion directly in terms of atomic parameters can be useful for understanding but may be rather misleading algorithmically.
The full material with details of corresponding derivations will be reported elsewhere (paper in preparation for Acta Crystallographica).
The work was supported by RFBR grant 00-04-48175 (VYL). The authors thank A.G. Kushnirenko and K.V. Kim who attracted their attention to the fast differentiation algorithm and C. Lecomte and Centre Charles Hermite (Nancy) for their support of the work
Agarwal, R.C. (1978) Acta Cryst., A34, 791-809.
Agarwal, R.C. (1981) In Refinement of Protein Stryctures: Proceeding of the Daresbury Study Weekend 15-16 November, 1980; compiled by P.A.Machin, J.W.Campbell and M.Elder, pp. 24-28. Warrington : Science and Engineering Research Council, Daresbury Laboratory.
Baur, W. & Strassen, V. (1983). Theoretical Computer Science, 22, 317-330.
Bricogne, G., Irwin, J. (1996) In Macromolecular Refinement: Proceedingof the CCP4 Study Weekend, E.Dodson, M.Moore, A.Ralph & S.Bailey, eds., pp.85-92. Warrington : Daresbury Laboratory.
Brünger, A.T. (1992) Nature, 355, 472-474.
Cooley, J.W. & Tukey, J.W. (1965) Math.Comput., 19, 297-301
Cowtan, K. & Ten Eyck, L.F. (1998) ECM-18 Abstracts, XVIIIth European Cryst.Meeting, 16-20 August 1998, Prague, Republic Czeck, E5-P7, Bulletin of the Czeck and Slovak Crystallographic Association, 5A, 136
Dodson, E. (1981) In Macromolecular Refinement: Proceeding of the CCP4 Study Weekend, E.Dodson, M.Moore, A.Ralph & S.Bailey, eds., pp. 85-92. Warrington : Daresbury Laboratory.
Guillot, B., Viry, L., Guillot, R., Lecomte, C., Jelsch, C. (2001) J.Appl. Cryst., 34, in press.
Hansen, N.K. & Coppens, P. (1978) Acta Cryst., A34, 909-921.
Hendrickson, W.A. & Konnert, J.H. (1980) In Biomolecular Structure, Function, Conformation and Evolution, edited by R.S.Srinivasan, Vol. 1, 43-57. Oxford : Pergamon.
Hestenes, M.R. & Stiefel, E. (1952) J.Res.Nat.Bur.Standards, 49, 409-436.
Kalinin, D.I. (1981) Kristallographiya, 25, 535-544.
Kim, K.M., Nesterov, Yu.E. & Cherkassky, B.V. (1984) Dokl.Acad.Nauk SSSR, 275, 1306-1309.
Lanczos, C. (1952) J.Res.Nat.Bur.Standards, 49, 33-53.
Lunin, V.Yu. (1985). Acta Cryst. A41, 551-556.
Lunin, V.Yu. & Skovoroda, T.P. (1995). Acta Cryst. A51, 880-887.
Lunin, V.Yu. & Urzhumtsev, A.G. (1985) Acta Cryst., A41, 327-333
Lunin, V.Yu. & Urzhumtsev, A.G. (1999) CCP4 Newsletter on Protein Crystallography, 37, 14-28
Murshudov, G.N., Vagin, A.A. & Dodson, E.J. (1997) Acta Cryst. D53, 240-255.
Pannu, N.S., Murshudov, G.N., Dodson, E.J. & Read, R.J. (1998) Acta Cryst. D54, 1285-1294
Pannu, N.S. & Read, R.J. (1996) Acta Cryst. A52, 659-668.
Read, R.J. (1997) In Methods in Enzymology, Academic Press, San Diego., C.W.Carter, Jr., R.M.Sweet, eds., 277B, 110-128.
Sheldrick, G.M. & Schneider, T.R. (1997). In Methods in Enzymology, Academic Press, San Diego., C.W.Carter, Jr., R.M.Sweet, eds., 277B, 319-343.
Sayre, D. (1951) Acta Cryst., 4, 362-367
Sayre, D. (1972) Acta Cryst., A28, 210-212
Sayre, D. & Toupin, R.A. (1975) Acta Cryst., A31, S20
Sussman, J.L., Holbrook, S.R., Church, G.M. & Kim, S.-H. (1977) Acta Cryst., A33, 800-804.
Templton, D. (1999) Acta Cryst., A55, 695-699.
Ten Eyck, L.F. (1973) Acta Cryst., A29, 183-191.
Ten Eyck, L.F. (1977) Acta Cryst., A33, 486-492.
Ten Eyck, L.F. (1999) IU Collected Abstracts, XVIIIth IUCr Congress & General Assembly, 4-13 August 1999, Glasgow, Scotland, 97.
Tronrud, D.L. (1992) Acta Cryst., A48, 912-916.
Tronrud, D.L. (1999) Acta Cryst., A55, 700-703.
Urzhumtsev, A.G., Lunin, V.Yu. & Vernoslova, E.A. (1989) J.Appl.Cryst., 22, 500-506