Phil Evans, MRC Laboratory of Molecular Biology, Hills Road, Cambridge
pre@mrclmb.cam.ac.uk
POINTLESS reads one or more files containing unmerged intensities, and prepares them for scaling (SCALA) in various ways. It accepts files produced by MOSFLM or COMBAT in MTZ format, and can also read unmerged files from XDS (XDS_ASCII) and unmerged SCALEPACK output.
Its principal functions are:
If multiple files are input, it will force the batch numbers to be unique (a longstanding irritation), and ensure consistent indexing in cases where there are alternatives. Dataset names may be edited. It can thus be used to prepare files for input to Scala, as a simpler alternative to SORTMTZ.
An early version of the program was described in Evans (2006), but more recent versions have additional options and some different scoring schemes. The program uses the clever facilities of Ralf GrosseKunstleve's cctbx library (GrosseKunstleve etal.2002) for its symmetry handling, as well as Clipper (Cowtan 2003) and CCP4libraries.
The true spacegroup cannot be definitely known until the structure has been solved and satisfactorily refined, since it is easy to be misled by pseudosymmetry or twinning. When the diffraction images are indexed, a lattice & crystal class is chosen based only on the cell dimensions, which is not a reliable guide to the true symmetry (it is perfectly possible for example to have a orthorhombic cell with a nearly equal b). However, by examining how well intensities which are related by potential symmetry agree with each other, we can make an estimate of the likelihood of possible Laue groups. The Laue group has the symmetry of the diffraction pattern, and is the rotational par tof the space group plus a centre of symmetry (Friedel symmetry) and any lattice centering (this is essentially the same as the Patterson group). For chiral space groups, ie for all macromolecules, there is only one point group corresponding to each Laue group (POINTLESS is general for all space groups, including nonchiral and centrosymmetric ones, but by default will choose only chiral groups.) The hierarchy of symmetry determination is: crystal class(lattice symmetry); Laue group (diffraction symmetry); point group; space group(including translational symmetry such as screw axes).
Normalisation. In order to do any scoring of agreement between intensities, we need to put the reflections at least approximately on the same scale. Proper scaling requires knowledge of the symmetry, which is what we are trying to determine, but we can do a rough job by normalising intensities to E^{2}, by making <E^{2}> = 1 over all resolution ranges. This done by fitting a Bfactor to resolution bin averages, then smoothing the residual deviations from <E^{2}> = 1for each resolutionbin with a spline function. Since radiation damage generally increases the apparent Bfactor, a simpleminded correction is made by fitting a Bfactor as a linear function of "time" (at present using the batch or image number as a proxy for time): this makes a substantial improvement in the scores in cases with marked radiation damage. The scale& Bfactors are determined separately for each "run" (ie sweep of contiguous images.)
Scoring functions. The main scoring function used in matching intensities is the correlation coefficient (CC), since it is relatively insensitive to the unknown scale. Probability estimates are then derived from the correlation coefficients. Multiplicityweighted Rfactors are calculated, but these are sensitive to the unknown scales, so are not used in ranking. Correlation coefficients do assume that the data all arise from the same distribution, which is why the raw intensities need to be normalised to E^{2}, otherwise a correlation will be observed just from the variation of <I> in particular with resolution. This can be seen by noting that the correlation coefficient between eg x_{i} & y_{i} is just the leastsquares slope of plot o fall the (x_{i}, y_{i}) points: since <I> is larger at low than at high resolution, if we plot pairs of potential but not symmetryrelated intensities, which are necessarily at the same resolution, then we will see an apparent correlation due to eg a strong low resolution intensity matching another intensity which is likely to be strong, and so on.
Symmetry elements. The highest possible lattice symmetry compatible with the cell dimensions, within quite generous limits (by default 2° and the equivalent on lengths), is chosen as the test symmetry, ignoring the symmetry in the input file. Each rotation axis in the lattice group is scored separately using all pairs of observations related by that rotation. A probability of the axis being present is estimated from the correlation coefficient, using an error estimate s (CC) derived from the distribution of correlation coefficients between unrelated pairs, proportional to 1/v(Npairs): this allows for a larger uncertainty if we only have a small sample of reflections. The distribution is modelled as a Lorentzian function, centred on an expectation or "ideal" value estimated as an average of (i) the CC for the identity or Friedel operator and (ii) an estimate of E(CC) allowing for the observed error estimates(E(CC) = Var(E^{2})/( Var(E^{2}) + <s^{2}(E^{2})>)(Read, personal communication). To allow for the possibility of pseudosymmetry, the expected value of CC if the symmetry is not present is not assumed to be 0, but is modelled as a declining probability from P(CC  no symmetry) = 1 at CC=0 to P(CC) = 0 at CC = E (CC  symmetry present) and integrated out. Normalising this probability, P(symmetry  CC) = P(CCsymmetry)/[ P(CCsymmetry)+ P(CCno symmetry)] gives a reasonably robust scoring of the likelihood of each possible symmetry element, without too much danger of overconfidence from an accidental high score with a very few observations. This means that it is often possible to get a reasonable estimate of the Laue group even from a small wedge of data.
Laue groups. The list of possible Laue groups, subgroups of the lattice group down to the minimum P1, can be generated from all possible pairs of symmetry elements, including the identity. An estimate of the likelihood of each group is calculated using the combination of the probabilities of each symmetry element which is either present or absent in each subgroup. In each subgroup, each potential symmetry element i is either present e_{i} = true, or absent e_{i} = false, and we have a measured CC_{i} and P(CC_{i}  e_{i}) for e_{i}= true or false. Then for each group, P(CC  group) = P_{i} P(CC_{i} e_{i}). This probability is used to rank the possible Laue groups. Various other scores are also listed for each subgroup: correlation coefficient, a "net Zscore" from the CCs, Rfactors and a measure of the lattice distortion from the original unit cell (in cases where the test lattice is higher symmetry than the original assignment). If the crystal class is different from that used in the integration, you should reprocess with the correct symmetry, ie with the correct cell constraints.
Systematic absences & space groups. These arise from translational symmetry operators, notably screw axes which lead to absences on axial reflections (in nonchiral crystals, glide planes lead to absences in 2dimensional zones). They can thus be used to distinguish between different space groups within a chosen pointgroup. However, they are not always a reliable guide to the true space group, because there are relatively few axial reflections, and axes lying close to the spindle rotation axis may be only partly sampled or missing from the dataset altogether, so the information from the absences should be treated with caution. POINTLESS uses a Fourier analysis of I / s values to estimate the probability of the translational element being present or not. For example, if the chosen point group might have a 2_{1} screw dyad along the a axis, this would be indicated by presence of h00 reflections only when h is even(=2n). Then the onedimensional Fourier transform of I / s (or I) should peak at 1/2 in Fourier space, and the peak height at 1/2 relative to the origin is a measure of the strength of the screw. A probability of the presence of the screw is then calculated, using an error estimate derived from samples of the same number of observations of nonaxial reflections, and again a Lorentzian distribution centred on the ideal value of 1. A similar analysis applies to 3fold screws, and to glide planes, but 4fold and 6fold screws are more complicated, since there are multiple possible Fourier peaks, at 1/4 & 1/2 for a 4fold, or 1/6, 1/3 & 1/2 for a 6fold, and these are not independent. These can be treated by using a distribution based on a single deviation from all the 2 (4fold) or 3 (6fold) ideal peak values, considering the deviation as a "distance" in 2 or 3 dimensions.
In many cases, combining the probabilities from the rotational symmetry and from the systematic absences gives a unique choice of space group, but often several different space groups may need to be tried in the structure determination. POINTLESS tries to avoid overconfidence in its assignment of likelihood, which works in most cases, but it is occasionally fooled by close pseudosymmetry.
If the Laue symmetry is lower than the lattice symmetry, there are alternative
indexing schemes which are different but equally valid, related by the
rotational symmetry operators present in the lattice but not in the Laue group.
These are the same conditions which allow merohedral twinning. For example, in
Laue group P3 (point group P3) there are four possible indexing schemes:
(h,k,l);(h,k,l); (k,h,l); (k,h,l). As well as these exact cases,
ambiguities may arise accidentally for special values of cell dimensions: for
example, a monoclinic cell with b=90° will appear
orthorhombic, leading to an alternative indexing as (h,k,l). Less obvious
cases can occur with special relationships involving cell diagonals. For some
examples, see http://www.ccp4.ac.uk/dist/html/reindexing.html
For the first crystal (or indexing), you are free to choose any of the alternatives, but subsequent indexing must match the original "reference" scheme. POINTLESS can check which scheme matches best in two ways: you can give a reference file HKLREF (which can now be either merged or unmerged), in which case the test data (HKLIN) will be checked against the reference, and its space group will be assumed to be correct; or if you give multiple test data files (HKLIN) and no HKLREF file is defined, the first one will be treated as a reference for alternative indexing, but the combined data will still be tested for Laue group symmetry.
Two examples were given in Evans (2006) and they remain valid even though the scoring system has changed. Most crystals give a clear answer: uncertainties generally arise through pseudosymmetry, including twinning, and the difficult case (2) illustrated here is not typical.
(1)Discriminating between orthorhombic groups with systematic absences
This case (Parker, unpublished) gave a clear indication of orthorhombic symmetry, Laue group Pmmm. Fourier analysis of the axial reflections gave a definitive suggestion that the space group was P22_{1}2_{1} (standard setting P2_{1}2_{1}2 with the reindexing operation (k,l,h))(table 1)
Axis 
Number 
Peak height at 1/2 
SD 
Probability 
Reflection condition 
2(1) [a] 
39 
0.234 
0.242 
0.000 
h00: h=2n 
2(1) [b] 
27 
0.997 
0.176 
0.970 
0k0: k=2n 
2(1) [c] 
87 
0.993 
0.109 
0.988 
00l: l=2n 
Table 1 . Systematic absence analysis
(2)Pseudosymmetry from incomplete pseudomerohedral twinning
The true space group in this case (Sanchez Barrena, unpublished) is P2_{1}2_{1}2_{1}but
all three cell lengths are about the same (79.2, 81.3, 81.2 Å) and the crystals
have a variable amounts of twinning into the apparent point group 422(twinning
operator k,h,l). Table 2 shows the scores for the possible cubic symmetry
operators for two crystals, a native crystal with about 20% twinning(refined
twin fraction), and a more highly twinned SeMet crystal. Table 3 shows their
scores for the different Laue groups: the native crystal gives the correct Pmmm
group, but the program is fooled by twinning in the SeMet data into preferring
Laue group P4/mmm

Native 
SeMet 

Symmetry operator 
Likelihood 
CC 
R_{meas} 
Likelihood 
CC 
R_{meas} 
Identity 
0.953 
0.97 
0.066 
0.949 
0.97 
0.076 
2fold (1 0 1) 
0.058 
0.22 
0.466 
0.055 
0.04 
0.605 
2fold (1 0 1) 
0.059 
0.23 
0.451 
0.054 
0.14 
0.516 
2fold (0 1 1) 
0.063 
0.01 
0.667 
0.056 
0.03 
0.653 
2fold (0 1 1) 
0.062 
0.01 
0.671 
0.054 
0.05 
0.636 
2fold (1 1 0) 
0.052 
0.04 
0.639 
0.713 
0.83 
0.155 
2fold k (0 1 0) 
0.947 
0.96 
0.103 
0.921 
0.93 
0.104 
2fold (1 1 0) 
0.051 
0.05 
0.631 
0.562 
0.78 
0.175 
2fold h (1 0 0) 
0.944 
0.95 
0.138 
0.916 
0.92 
0.111 
2fold l (0 0 1) 
0.943 
0.95 
0.156 
0.931 
0.94 
0.108 
3fold (1 1 1) 
0.059 
0.00 
0.766 
0.058 
0.02 
0.639 
3fold (1 1 1) 
0.058 
0.01 
0.776 
0.057 
0.03 
0.800 
3fold (1 1 1) 
0.058 
0.01 
0.731 
0.053 
0.06 
0.673 
3fold (1 1 1) 
0.058 
0.01 
0.798 
0.059 
0.02 
0.838 
4fold h (1 0 0) 
0.066 
0.02 
0.731 
0.064 
0.00 
0.686 
4fold k (0 1 0) 
0.056 
0.21 
0.463 
0.054 
0.05 
0.586 
4fold l (0 0 1) 
0.052 
0.04 
0.619 
0.539 
0.77 
0.175 
Table2. Scores for potential symmetry elements for native and SeMet crystals. Both crystals show the dyads for orthorhombic symmetry (bold), but the SeMet crystal is more highly twinned (perhaps ~35%) than the native (~20%) also shows the tetragonal operators (italic bold)
Native 
SeMet 

Rank 
Laue group 
Reindex 
Prob 
ZCC 
CC+ 
CC 
R 
Prob 
ZCC 
CC+ 
CC 
R 
Rank 
1 
Pmmm 
[h,k,l] 
0.988 
9.05 
0.96 
0.05 
0.11 
0.209 
7.24 
0.94 
0.22 
0.09 
2 
2 
P 1 2/m 1 
[k,l,h] 
0.004 
7.91 
0.96 
0.17 
0.08 
0.002 
6.57 
0.96 
0.30 
0.08 
5 
3 
P 1 2/m 1 
[l,h,k] 
0.003 
7.96 
0.96 
0.16 
0.10 
0.001 
6.65 
0.95 
0.28 
0.09 
7 
4 
P 1 2/m 1 
[h,k,l] 
0.003 
7.87 
0.96 
0.17 
0.11 
0.001 
6.49 
0.95 
0.31 
0.08 
6 
5 
P 4/mmm 
[l,k,h] 
0.000 
4.58 
0.53 
0.07 
0.32 
0.000 
2.61 
0.54 
0.27 
0.29 
13 
6 
P 4/m 
[l,k,h] 
0.000 
4.57 
0.63 
0.18 
0.25 
0.000 
4.13 
0.71 
0.30 
0.18 
11 
7 
P 4/mmm 
[k,h,l] 
0.000 
6.53 
0.66 
0.01 
0.24 
0.778 
8.32 
0.87 
0.04 
0.13 
1 
8 
P 4/m 
[k,h,l] 
0.000 
5.31 
0.70 
0.17 
0.19 
0.002 
6.31 
0.89 
0.26 
0.11 
4 
9 
P 1 
[h,k,l] 
0.000 
7.52 
0.97 
0.22 
0.07 
0.000 
6.38 
0.97 
0.33 
0.08 
10 
10 
P 4/m 
[h,l,k] 
0.000 
4.35 
0.62 
0.18 
0.25 
0.000 
3.01 
0.63 
0.33 
0.23 
12 
11 
P 4/mmm 
[h,l,k] 
0.000 
5.07 
0.56 
0.05 
0.30 
0.000 
3.43 
0.60 
0.25 
0.26 
14 
Table3. Laue group rankings for the same crystals as in table 1. For the native crystal, the top rank solution is the correct Pmmm. For the SeMet crystal, P4/mmm is ranked higher because of the pseudosymmetry from the twinning. Prob is the likelihood estimate, ZCC is the net "Zscore" for CC, CC+ is for symmetry operators present in the Laue group, CC for symmetry operators present in the cubic lattice but not in the Laue group, & R is the Rfactor R_{meas}.
In most cases, POINTLESS will give an unambiguous assignment of the Laue group, and often a good indication of the space group. Nevertheless, the results should always be treated with some caution, because of the possibility of pseudosymmetry, which is not uncommon. In difficult cases, careful examination of the scores may lead to a decision different to that given by the program.
The options to combine multiple input files (from version 1.2.0) provides a more convenient method than the previous use of SORTMTZ, since it ensures that the files are on the same indexing system, and it adjusts the batch numbers if necessary to ensure that they are unique. POINTLESS is available from the CCP4 prerelease site or by anonymous ftp from ftp.mrclmb.cam.ac.uk/pub/pre/, and will be in future releases of CCP4. The program is under active development, the ultimate aim being to replace and extend all the scaling functions of Scala
I have been helped in the development of POINTLESS by many useful discussions with many people, including George Sheldrick, Ralf GrosseKunstleve, Airlie McCoy, Randy Read, Eleanor Dodson, Kevin Cowtan, Andrew Leslie, and Graeme Winter.
The Clipper C++ libraries for Xray crystallography , Cowtan K. (2003) IUCrComputing Commission Newsletter,2, 49
Scaling & assessment of data quality, Evans, P.R., (2006) Acta Cryst. D 62, 8282
TheComputational Crystallography Toolbox: crystallographic algorithms in are usable software framework,GrosseKunstleve, R.W., Sauter, N.K., Moriarty, N.W.& Adams, P.D (2002) J.Appl.Cryst. 35, 126136