*------------ CCP4 Newsletter - January 1997 ------------*

Back to Contents....

MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 4RP,UK

pre@mrc-lmb.cam.ac.uk

A new version of Scala with many changes will be included in the next CCP4 release. This article describes some of the main features, and changes from earlier versions.

Scala now includes the averaging of multiple observations of reflections, and supercedes the program Agrovata. Many of the statistical analyses from Agrovata have been retained, and some new ones added. The program typically does four passes through the data, although the steps may be performed separately (steps (3) & (4) always go together):

- an analysis to get initial scale factors
- a scaling pass (several cycles), determining the scales according to the chosen scaling model
- an analysis pass to analyse discrepancies, and by default to correct the standard deviations
- a final pass to calculate scales, analyse agreement & write the output file. By default, this step now mimics the program Agrovata and writes a file of averaged intensities for input to the program Truncate. There are two other output options: OUTPUT SEPARATE, the scaled observations are written out, as in the earlier versions of Scala; and OUTPUT UNMERGED, the observations are scaled, and partials are summed, but multiple observations are not merged. This last option may be useful for the MADSYS approach to MAD phasing, in which the phasing calculations are done on unmerged data.

Scale & relative Bfactor: for data from synchrotrons, it is usually best to use a separate scale factor for each image (SCALES BATCH), since the incident beam intensity can change discontinuously between images. However, the relative B-factor is essentially a correction for absorption and radiation damage, and depends on the crystal rather than on the beam, so is likely to vary smoothly. The program now has an option to vary the Bfactor smoothly, while giving each image its own scale factor (eg SCALES BATCH BROTATION SPACING 5).

There remains a need for an anisotropic scaling option to cope with crystals whose diffracting power falls off anisotropically: this version contains a new anisotropic scale option, but it is not very satisfactory, as it is usually ill-determined, for the same reasons as the 3-dimensional scaling is ill-determined, unless a reference dataset is used. More work is needed on this.

Programs such as Mosflm and Denzo which integrate each slice of a reflection separately leave the scaling program with a problem: given a collection of integrated parts, on successive images, how do we know when we have got all of them? This is particularly a problem with Denzo, which normally refines all the prediction parameters for each image, so that different parts of the same reflection may be predicted with different parameters (Mosflm normally refines parameters using data from more than one image, though different parameters are still used for each image). There would be no problem if all parts were predicted with the same parameters, though this would not cope with slipping crystals.

Scala now offers a selection of options for deciding when all parts are present, based on the predicted fraction passed from the integration program (FRACTIONCALC). Partial reflections will be accepted if their total predicted fraction lies between eg 0.95 and 1.05. these limits may be set depending on your confidence in the fractions, as a compromise between completeness and reliability. Mosflm also passes a flag (MPART) which records Mosflm's calculation of which part is which (eg MPART = 43 means this observation is part 3 of 4). Scala will check these flags for consistency as an alternative to checking the total fraction.

The default in scaling is to omit weak reflections (EXCLUDE SDMIN 6). This seems to speed convergence considerably, but may cause problems with very weak data. The default value may need to be changed in the light of experience.

Many protein crystals show marked diffuse scattering, which is seen as long tails on spots in the "phi" direction, so that reflections often appear on the image before they are predicted. If the mosaicity is increased to include these tails, too many reflections may be rejected as overlaps. Fully-recorded reflections are integrated over a smaller phi width than partials, so more of the tails are chopped off for fulls than for partials. This leads to the typical negative partial bias, with partials systematically larger than equivalent fulls.

A correction has been introduced which attempts to correct for the different truncation of diffuse scattering tails, using a simple model of thermal diffuse scattering, expressed as 2 or 3 parameters over the whole data set. This correction reduces the partial bias substantially, and seems to improve the data generally, though sometimes the parameter refinement can be a little unstable.

Normal probability analysis (see for example D.Smith and L.Howell, J.Appl.Cryst (1992) 25, 81-86: D.Smith, CCP4 Study Weekend (1993) 99-106) compares the normalized deviations (Chihl) with a normal distribution

Chihl = (Ihl - <Iothers>)/sqrt(sigma(Ihl)**2 + sigma(others)**2)where Ihl is a measured intensity, and <Iothers> is the mean of the other observations of the same or equivalent reflections. If the measured intensities Ihl do indeed follow a normal distribution, and the estimated errors shl are correct, then the Chihl will follow a normal distribution with mean 0 and standard deviation 1. From a sorted list of Chihl, we can predict the expected Chi corresponding to that rank in the list, to give a set of Chiobs, Chicalc pairs. Plotting Chiobs against Chicalc should then give a straight line of slope = 1.0. This plot has a number of useful properties. It shows clearly if the errors do not follow a normal distribution, as is commonly the case. The program splits the Ihl data into classes by "run" and for fully recorded and partial reflections, comparing each to the <Iothers> for all observations, so the if the normal probability plots are different for different classes of reflection, this indicates a systematic difference between the classes.

Because sd(I) estimates from integration programs are often poor, Scala, like Agrovata compares the observed scatter of multiple observations with their estimated sd, and applies a simple correction model

sd(I)' = Sdfac * sqrt[sd(I)**2 + (Sdadd * I)**2]Scala now estimates the multiplier factor Sdfac automatically, by making the slope of the central part of the normal probability analysis of the scatter equal to 1.0. Automatic calculation of Sdadd is more difficult, and not done at present.

Planned additions to the program include improved handling and analysis of anomalous data, spherical harmonic parameterization of the scaling, and multiplicity- weighted statistics as suggested by Diederichs and Karplus (personal communication).

This version should be in the next CCP4 distribution, 3.3. In the mean time, it is available in a beta test form from ftp://ftp.mrc-lmb.cam.ac.uk/pub/scala_2.2.2.tar.gz It will probably be number version 2.2.3 by the time of the release.