AIMLESS (CCP4: Supported Program)¶
NAME¶
aimless - scale together multiple observations of reflections
SYNOPSIS¶
aimless HKLIN foo_in.mtz HKLOUT foo_out.mtz
DESCRIPTION¶
This program scales together multiple observations of reflections, and merges multiple observations into an average intensity: it is a successor program to SCALA
Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of Phi (the rotation angle ROT), or expressed as BATCH (image) number (strongly deprecated). In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction expanded as spherical harmonics. The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995)).
The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates a weighted mean of the observations of the same reflection, after rejecting the outliers.
The program does several passes through the data:
initial estimate of the scales
first round scale refinement, using strong data using an I/sigma(I) cutoff
first round of outlier rejection
if both summation and profile-fitted intensity estimates are present (eg from Mosflm), then the cross-over point is determined between using profile-fitted for weak data and summation for strong data.
first analysis pass to refine the “corrections” to the standard deviation estimates
final round scale refinement, using strong data within limits on the normalised intensity E2
final analysis pass to refine the “corrections” to the standard deviation estimates
final outlier rejections
a final pass to apply scales, analyse agreement & write the output file, usually with merged intensities, but alternatively as file with scaled but unmerged observations, with partials summed and outliers rejected, for each dataset
Anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)
Running the program¶
Aimless will often be run from the CCP4 GUI, but may also be run from a script. In a script the input and output files may be assigned on the command line, or some of them (marked with an asterisk in the list below) may be assigned as keyworded input commands. The option switch “–no-input” forces the program to run immediately with default options, without waiting for input commands, using file assignments from the command line.
Input files:
Output files:
HKLOUT*, XMLOUT*, SCALES*, ROGUES*, TILEIMAGE
Plot
files also represented in XMLOUT:
ROGUEPLOT, NORMPLOT, ANOMPLOT, CORRELPLOT
These separate XMgrace files may be suppressed with the command PLOT NOXMGR: the XML representation is still written to XMLOUT Explicit file assignments for optional output reflection files, otherwise generated from HKLOUT: HKLOUTUNMERGED, SCALEPACK, SCALEPACKUNMERGED
Scaling options¶
- The optimum form of the scaling will depend a great deal on how the
data were collected. It is not possible to lay down definitive rules, but some of the following hints may help. For most purposes, my normal recommendation is the default
scales rotation spacing 5 secondary bfactor on brotation spacing 20
Other hints:
Only use the SCALE BATCH option if every image is different from every other one, i.e. off-line detectors (including film), or rapidly or discontinuously changing incident beam flux. This is rarely the case for synchrotron data, but is appropriate for serial data (eg XFEL). This mode may be VERY slow if there are many batches.
If there is a discontinuity between one set of images and another ( e.g. change of exposure time), then flag them as different RUNs. This will be done automatically if no runs are specified.
The SECONDARY correction is recommended and is the default: this provides a correction for absorption. It should always be restrained with a TIE SURFACE command (this is the default): under these conditions it is reasonably stable under most conditions. The ABSORPTION (crystal frame) correction is similar to SECONDARY (camera frame) in most cases, but may be preferable if data has been collected from multiple alignments of the same crystal.
Use a B-factor correction unless the data are only very low-resolution. Traditionally, the relative B-factor is a correction for radiation damage (hence it is a function of time), but it also includes some other corrections eg absorption.
When trying out more complex scaling options, it is a good idea to try a simple scaling first, to check that the more elaborate model gives a real improvement.
When scaling multiple MAD data sets they should all be scaled together in one pass, outliers rejected across all datasets, then each wavelength merged separately. This is the default if multiple datasets are present in the input file.
Other options are described in greater detail under the Keyworded Input.
Control of flow through the program¶
The ONLYMERGE flag skips the scaling (often in conjuction with RESTORE to read in previously determined scales), calculates statistics and outputs the data.
Partially recorded reflections¶
See appendix 1
The different options for the treatment of partials are set by the PARTIALS command. Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.
Summed partials [default]:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.
Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5]. This only works well if the calculated fractions are accurate, which is not usually the case.
Scaling algorithm¶
- The normal scaling method improves the internal consistency of the
dataset by minimising
Sum( whl * ( Ihl - ghl * Ih )**2 )
See appendix 2 for more details
Scaling to reference¶
An alternative method scales to an external previously-determined reference dataset, minimising
Sum( whl * ( Ihl - ghl * Ihref )**2 )
where Ihref is the reference intensity and the weight whl = 1/(var(Ihl) + var(Ihref))
This option might be useful for example in scaling long-wavelength data with high absorption to a short-wavelength set from a similar crystal. It is specified with the command REFINE REFERENCE. Reference intensities are taken from an MTZ file of merged intensities specified as HKLREF (command or command line). If intensities are not available, amplitudes F are accepted, and will be squared to intensities, but note that Fs which come from the French & Wilson “truncate” procedure are seriously biased for small intensities, so Fs are deprecated. A coordinate reference XYZIN is not accepted, as that does not seem to work well (in some limited tests).
By default, the first intensity column (or amplitude column) in the file is used, or the column may be explicitly set using the LABREF command. Note that provided the columns are contiguous, only the first of the set need be specified or chosen automatically eg LABREF I=I(+) will pick up I(+), SIGI(+), I(-), SIGI(-)
Data from Denzo¶
Data integrated with Denzo may be scaled and merged with Aimless as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 3 for more details.
Datasets¶
TBD
KEYWORDED INPUT - DESCRIPTION¶
In the definitions below “[ ]” encloses optional items, “" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after “!” or “#” is treated as comment. The available keywords are:
ANALYSIS, ANOMALOUS, BINS, DUMP, EXCLUDE, HKLIN, HKLOUT, HKLREF, INITIAL, INTENSITIES, KEEP, LABREF, LINK, NAME, ONLYMERGE, OUTPUT, PARTIALS, REFINE, REJECT, RESOLUTION, RESTORE, ROGUES, RUN, SCALES, SDCORRECTION, TIE, TITLE, UNLINK, USESDPARAMETER, XMLOUT, XYZIN, CELL, ICERING, BFACTOR
RUN <Nrun> BATCH <b1> to <b2>¶
Define a “run” : Nrun is the Run number, with an arbitrary integer label ( i.e. not necessarily 1,2,3 etc). A “run” defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The definition of a run may use several RUN commands. If no RUN command is given then run assignment will be done automatically, with run breaks at discontinuities in dataset, batch number or Phi. If any RUN definitions are given, then all batches not explicitly specified will be excluded.
SCALES [<subkeys>]¶
Define layout of scales, ie the scaling model. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.
- Subkeys:
- RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs
- ROTATION <Nscales> SPACING <delta_rotation>
Define layout of scale factors along rotation axis ( i.e. primary beam), either as number of scales or (if SPACING keyword present) as interval on rotation [default SPACING 5]
- BATCH
Set “Batch” mode, no interpolation along rotation (primary) axis. This option is compulsory if a ROT column is not present in the input file, but otherwise the ROTATION option is preferred. WARNING: this option is not optimised and may take a very long time if you have many batches
- BFACTOR ON OFF
Switch Bfactors on or off. The default is ON.
- BROTATION<Ntime> SPACING <delta_time>
Define number of B-factors or (if SPACING keyword present) the interval on “time”: usually no time is defined in the input file, and the rotation angle is used as its proxy [default SPACING 20].
- SECONDARY [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the camera spindle frame. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6, default 4). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. Set Lmax = 0 to switch off
- ABSORPTION [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the crystal frame based on POLE (qv). The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6, default 4). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. This is not substantially different from SECONDARY in most cases, but may be preferred if data are collected from multiple settings of the same crystal, and you want to use the same absorption surface. This would only be strictly valid if the beam is larger than the crystal.
- POLE <hkl>
Define the polar axis for ABSORPTION or SURFACE as h, k or l (eg POLE L): the pole will default to either the closest axis to the spindle (if known), or l (k for monoclinic space-groups).
- CONSTANT
One scale for each run (equivalent to ROTATION 1)
- TILE <NtileX> <NtileY> [CCD]
Define a detector scale for each tile. Currently this implements a scale model for 3x3 tiled CCD detectors to correct for the underestimation of intensities in the corners of the tile, see Appendix 2. If the detector appears to be a 3x3 CCD (3072x3072 pixels) then this correction will be activated automatically unless the NOTILE keyword is given. The parameters are restrained using the TIE TILE parameters (qv)
- NOTILE
Switch off the automatic TILE 3 3 correction for CCD detectors
SDCORRECTION [[NO]REFINE] [INDIVIDUAL SAME [FIXSDB]]¶
[RUN <RunNumber>] [FULL PARTIAL] <SdFac> [<SdB>] <SdAdd> [DAMP <dampfactor>]¶
[SIMILAR [<sd1> <sd2> <sd3>]]¶
[[NO]TIE SdFac SdB SdAdd <targetvalue> <SDtarget>]¶
[SAMPLESD]¶
Input or set options for the “corrections” to the input standard deviations: these are modified to
sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl + (SdAdd*Ihl)**2}
where Ihl is the intensity and (SdB may be omitted in the input). The default is “SDCORRECTION REFINE INDIVIDUAL”, If explicit values are given, the default changes to NOREFINE.
The keyword REFINE controls refinement of the correction parameters, essentially trying to make the plot of the SD of the distribution of fraction deviations (Ihl - <I>)/sigma = 1.0 over all intensity ranges. The residual minimised is Sum( w * (1 - SD)^2) + Restraint Residual
SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested. Other subkeys control what values are determined and used for each run (if more than one). TIE and SIMILAR are mutually exclusive
SAME[default] same SD parameters for all runs, different for fulls and partials
INDIVIDUAL use different SD parameters for each run, fulls and partials
FIXSDB fixes the SdB parameter in the refinement (but it seems best to let it refine, even though it has no obvious physical meaning)
DAMP set dampfactor to damp shifts in the refinement [default 0.05]
SIMILAR restrain parameters to be the same for all runs, with SDs optionally given for SdFac (sd1), SdB (sd2), and SdAdd (sd3) [defaults 0.2, 3.0, 0.04]
TIE set restraints for named parameter, “SdFac”, “SdB”, or “SdAdd”. Each restraint is to a specified target value, with a weight = 1/(SDtarget^2). The default is to restrain SdB only, target value 0.0, SD 20. NOTIE removes all restraints, TIE without values sets the defaults.
RUN <run_number> Define run for which values are given the run must have been previously defined. If no run is defined, it applies to all runs. Different values may be specified for fully recorded reflections (FULL) and for partially recorded reflections (PARTIAL), or the same values may be used for both if one set is given, e.g.
sdcorrection full 1.4 0.11 part 1.4 0.05
USESDPARAMETER [NO DIAGONAL COVARIANCE]¶
For the final estimation of intensity errors sd(I), incorporate the estimated error in the refined scale model parameters, as estimated from the inverse normal matrix in the scale refinement. The default is DIAGONAL if this keyword is omitted, or given with no sub-keyword. “NO” switches it off. The DIAGONAL option uses the separate parameter variances, ie the diagonal of the variance/covariance matrix. COVARIANCE uses the full matrix, which is slower but may be more accurate.
The variance/covariance matrix [V] = Sum(wD^2)/(m-n) [H]^-1, where [H] is the normal (Hessian) matrix, Sum(wD^2) is the minimised residual, m the number of observations, and n the number of parameters.
The scaled intensity I’hl = Ihl/ghl where ghl is its inverse scale factor
Var(I’)/I’^2 = Var(I)/I^2 + Var(g)/g^2 ie Var(I’) = (1/g^2) [ Var(I) + I’^2 Var(g) ]
Var(g) = [dg/dp]T [V] [dg/dp] (COVARIANCE option) where dg/dp is the vector of partial derivatives with respect to parameters p DIAGONAL approximation: Var(g) = Sum(i) { [dg/dp(i)]^2 V(i,i) } ie summed over parameters i
PARTIALS [[NO]CHECK] [TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP [<maxgap>]]¶
Set criteria for accepting complete or incomplete partials. Default is CHECK TEST 0.95 1.05 CORRECT 0.95 NOGAP
After all parts have been assembled, the total observation is accepted if:-
the CHECK flag is set [default] and the MPART flags (if present) are all consistent (these flags indicate that a set of parts is eg 1 of 3, 2 of 3, 3 of 3)
if CHECK fails, then the total fraction is checked to lie between lower_limit & upper_limit [default 0.95, 1.05]
if this fails, then the incomplete partial is scaled up by the total fraction if it is > minimum_fraction [default 0.95] (NB Pointless has different default for a different purpose)
a reflection has a gap in the middle may be accepted if GAP is set, maxgap is maximum number of missing slots [not recommended: default 1 if GAP is set]
INITIAL UNITY | MEAN | MINIMUM_OVERLAP <minimum_overlap> | MAXIMUM_GAP <maximum_gap>¶
Set initial scale factors either based on mean intensities (MEAN, default) or all set to 1.0 (UNITY) If the fractional overlap between rotation ranges is less than minimum_overlap in too many rotation ranges, then scaling will be switched off (ie ONLYMERGE and so will SD correction refinement (SDCORRECTION NOREFINE)). Default value 0.05. Set to a value <= 0.0 to ignore this check. maximum_gap specifies the maximum number of contiguous rotation ranges which are allowed to fall below the minimum_overlap criterion, default 2. Fractional overlap is (Number of observations with matching observations in a different rotation range)/(Total number of observations)
INTENSITIES [SUMMATION PROFILE COMBINE [<Imid>] [POWER <Ipower>]¶
Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. This applies to all stages of the program, scaling & averaging. Mosflm produces two different estimates of the intensity, from summation integration and from profile fitting. Generally the profile-fitted estimate is better, but for the strongest reflections the summation value is often better. The default is to use a weighted mean, depending on the “raw” intensity ie before LP correction (COMBINE option), and to optimise automatically the switch-over point Imid, to give the best overallR meas.
Subkeys:
- SUMMATION
use summation integrated intensity Isum
- PROFILE
use profile-fitted intensity Ipr
- COMBINE [<Imid>] [POWER <Ipower>]
Use weighted mean of profile-fitted & integrated intensity, profile-fitted for weak data, summation integration value for strong.
If no value is given for Imid, it will be automatically optimised - Ipower defaults to 3
I = w*Ipr + (1-w)*Isum
w = 1/(1 + (Iraw/Imid)^Ipower)
REJECT¶
[SCALE | MERGE] [COMBINE] [SEPARATE] <Sdrej> [<Sdrej2>]
[ALL <Sdrej+-> [<Sdrej2+->]]
[KEEP | REJECT | LARGER | SMALLER]
[EMAX <Emax>]
[BATCH <batchrejectfactor>]
[NONE]
Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. The default values are REJECT 6 ALL -8, ie test within I+ or I- sets on 6sigma, between I+ & I- with a threshold adjusted upwards from 8sigma according to the strength of the anomalous signal. The adjustment of the ALL test is not necessarily reliable.
If there are multiple datasets, by default, deviation calculations include data from all datasets [COMBINE]. The SEPARATE flag means that outlier rejections are done only between observations from the same dataset. The usual case of multiple datasets is MAD data.
f ANOMALOUS ON is set, then the main outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I- observations. Observations rejected on this second check are flagged “@” in the ROGUES file.
REJECT BATCH <batchrejectfactor> is intended for batch scaling of eg XFEL data. After the initial scales are calculated, very weak batches with scale factorsbatchrejectfactor x median scale are rejected
REJECT NONE skips all outlier checking, REJECT EMAX 0.0 switches off Emax testing
- Subkeys:
- SEPARATE
rejection & deviation calculations only between observations from the same dataset
- COMBINE
rejection & deviation calculations are done with all datasets [default]
- SCALE
use these values for the scaling pass
- MERGE
use these values for the merging (FINAL) pass
- sdrej
sd multiplier for maximum deviation from weighted mean I [default 6.0]
- [sdrej2]
special value for reflections measured twice [default = sdrej]
- ALL
check outliers in merging step between as well as within I+ & I- sets (not relevant if ANOMALOUS OFF). A negative value [default -8] means adjust the value upwards according to the slope of the normal probability analysis of anomalous differences (AnomPlot)
- sdrej+-
sd multiplier for maximum deviation from weighted mean I including all I+ & I- observations (not relevant if ANOMALOUS OFF)
- [sdrej2+-]
special value for reflections measured twice [default = sdrej+-]
- KEEP
in merging, if two observations disagree, keep both of them [default]
- REJECT
in merging, if two observations disagree, reject both of them
- LARGER
in merging, if two observations disagree, reject the larger
- SMALLER
in merging, if two observations disagree, reject the smaller
- EMAX
maximum acceptable value for E = normalised F, <= 0.0 to switch off test [default = 10.0 for acentrics]. Observations are only rejected if E > EMAX and I/sd(I) > sdrej, to allow for inaccurate normalisation in very weak high resolution bins.
The test for outliers is described in Appendix 4
ICERING [R1 | R2 | NONE] [[NO]REJECT]¶
By default (NOREJECT), reflections lying in ice rings are omitted from the scaling and normalisation, but included in the final statistics and the output file. The REJECT option here will omit them from the final statistics and the output file. Two sets of ring definitions are available: R1 is a list of 10 rings from Mosflm, thanks to Harry Powell; R2 (the default) is a longer list of 61 narrower rings from Clemens Vonrhein, as used in autoPROC. NONE switches off all ice ring rejections.
ANOMALOUS [OFF] [ON]¶
- OFF [default]
no anomalous used, I+ & I- observations averaged together in merging
- ON
separate anomalous observations in the final output pass, for statistics & merging: this is also selected the keyword ANOMALOUS on its own
RESOLUTION [RUN <RunNumber>] [[LOW] <Resmin>] [[HIGH] <Resmax>]¶
Set resolution limits in Angstrom, either order, optionally for individual datasets. The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If a RUN is specified this limit applies only to that run: this may a previous general limit for all runs, and may be used with automatic run generation. [Default use all data]
TITLE <new title>¶
Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout
ANALYSIS [CONE <angle>] [CCMINIMUM <MinimumHalfdatasetCC>] [CCANOMMINIMUM <MinimumHalfdatasetAnomCC>] [ISIGMINIMUM <MinimumIoverSigma>] [BATCHISIGMINIMUM <MinimumBatchIoverSigma>] [GROUPBATCH <BatchGroupRange>]¶
Specify analysis parameters:
- CONE specifies the half-angle (degrees) for cones around each
reciprocal axis, for anisotropy analysis [default 20].
- CCMINIMUM & ISIGMINIMUM specify thresholds for estimation of suitable
maximum resolution limits, both overall and along each reciprocal axis. These estimates are printed in the final Results summary, and give guide to possible cut-offs. BATCHISIGMINIMUM gives the threshold for the analysis of maximum resolution by batch, on <I/sd> before averaging. CCANOMMINIMUM is the threshold for analysis of the resolution limit of strong anomalous differences, from CC(1/2)anom.
- Resolution estimates from CC(1/2) and CC(1/2)anom are done by fitting
a function (1/2)(1 - tanh(z)) where z = (s - d0)/r, s = 1/d^2, and d0 is the value of s for which the function = 0.5, and r controls the steepness of falloff. For very negative CCs (usually from CCanom), an additional offset parameter dcc is added, {(1/2)(1 - tanh(z) * dcc - dcc + 1}. The fitted function is plotted along with the values. This curve-fitting was suggested by Ed Pozharski.
MinimumHalfdatasetCC minimum half-dataset CC(1/2) [default 0.3]
MinimumIoverSigma minimum <<I>/sd(<I>)> (=~ signal/noise) [default 1.5]
MinimumBatchIoverSigma minimum <I/sd(I)> (=~ signal/noise) [default 1.0, a smaller value as I/sd is before averaging]
MinimumHalfdatasetAnomCC minimum half-dataset CCanom [default 0.15]
GROUPBATCH BatchGroupRange: in the analyses against Batch, the batches (images) are grouped to reduce the number of ranges, with a group size of BatchGroupRange degrees [default 1.0 degrees]
ONLYMERGE¶
Only do the merge step, no initial analysis, no scaling. Now, as of version 0.7.7, SDCORRECTION and outlier rejection will be switched off, unless they are explicitly given. This is equivalent to
SDCORRECTION NOREFINE 1.0 0.0 0.0 # No SD correction
REJECT NONE # no outlier rejection
DUMP [<Scale_file_name>]¶
Dump all scale factors to a file after the main scaling. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line.
RESTORE [<Scale_file_name>]¶
Read scales and SDcorrection parameters from a SCALES file from a previous run of Aimless (see DUMP).
REFINE [CYCLES <Ncycle>] [BFGS FH REFERENCE] [SELECT <IovSDmin> <E2min> [<E2max>]] [PARALLEL [AUTO] <Nprocessors> <Fractionprocessors>]¶
Define number of refinement cycles Ncycle and method for scale refinement.
BFGS use BFGS optimisation (usual method)
FH use Fox-Holmes least-squares algorithm (not recommended)
REFERENCE scale to an external reference dataset, specified as a merged MTZ file with the HKLREF command. This should contain intensities (either IMEAN or I+/I-), or amplitudes F which will be squared to intensities: intensities are strongly preferred, as squared Fs which have been “truncated” are significantly biased. The LABREF command may be used to specify the column label, otherwise the first intensity (or F) will be used. If I+ and I- are given, Imean for scale refinement is calculated as the unweighted mean. sigma(I) (if present) is assumed to be in the column following the intensity. SELECT define selection limits for the two rounds of scaling. If unset, suitable values will be chosen automatically
IovSDmin <I>/sd’(I) limit for selection of reflections for 1st round scaling (< 0 for automatic selection)
E2min minimum E2 for selection of reflections for main scaling [default 0.8]
E2max maximum E2 for selection of reflections for main scaling [default 5.0]
PARALLEL use multiple processors for the scale refinement steps, if available. This produces some speed-up for very large jobs.
For this option to be available, the program must be compiled and linked with the “-fopenmp” option, and the environment variable OMP_NUM_THREADS must be set to the maximum number of threads allowed by the system
<Nprocessors> number of processors to use (this will be forced to be < OMP_NUM_THREADS)
<Fprocessors> (< 1.0) fraction of OMP_NUM_THREADS to use
AUTO [default if no argument to PARALLEL] determine the number of processors to use from the number of observations in the file, currently 1 processor / 200 000 observations, up to the maximum allowed (the optimum settings for this have yet to be determined)
EXCLUDEBATCH <batch range>|<batch list>¶
BATCH | <b1> <b2> <b3> … | <b1> TO <b2> |
Define a list of batches, or a range of batches, to be excluded altogether.
TIE [SURFACE <Sd_srf>] [BFACTOR <Sd_bfac>] [ZEROB <Sd_zerob>] [ROTATION <Sd_z>] [TILE <Sd1-5>] [TARGETTILE <r0> w0>]¶
Apply or remove restraints to parameters. These can be pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) to have the same value, or neighbouring Bfactors, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.001. A negative SD value indicates no tie.
SURFACE: tie surface parameters to spherical surface [default is TIE SURFACE 0.001]
BFACTOR: tie Bfactors along rotation
ZEROB: tie all B-factors to zero
ROTATION: tie parameters along rotation axis (mainly useful with BATCH mode)
TILE: tie the CCD tile parameters. 5 SDs for radius r, width w, amplitude A, centre x0,y0, and Fourier coefficients
TARGETTILE: target values for tile parameters r and w
OUTPUT [MTZ] [NO]MERGED [UNMERGED [SPLITTOGETHER]] [SCALEPACK [MERGED UNMERGED]] [ORIGINAL REDUCED]¶
Control what goes in the output file. Two types of output files may be produced, either in MTZ format or in Scalepack format: (a) MERGED (or AVERAGE), average intensity for each hkl (I+ & I-) (b) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected. Up to four types of files may be created at the same time: UNMERGED filenames are created from the HKLOUT filename (with dataset appended if there are multiple datasets) with the string “_unmerged” appended, unless an explicit filename is given for unmerged MTZ on the UNMERGEDOUT command. If there are multiple datasets, by default MTZ files, merged or unmerged, are split into separate files (SPLIT). Unmerged MTZ files may optionally include all datasets if the keyword TOGETHER qualifies UNMERGED.
Output unmerged MTZ files by default have hkl indices reduced to the asymmetric unit (REDUCED), with the symmetry number need to regenerate the original measured hkl stored as ISYM ( (ISYM-1)/2 is the index into the symmetry operators in the order stored in the MTZ file if ISYM is odd, an even value of ISYM indicates a Friedel-related hkl, ie -h,-k-,-l etc). Alternatively the file may be written with the original hkl (option ORIGINAL), in which case ISYM = 1 always.
The default is to create a merged MTZ file for each dataset.
- File format options:
- NONE
no output file written
- MERGED or AVERAGE
[default] output averaged intensities, <I+> & <I-> for each hkl
- UNMERGED
apply scales, sum or scale partials, reject outliers, but do not average observation. hkl indices are either REDUCED (default) or ORIGINAL
- SCALEPACK or POLISH
Write reflections to a formatted file in a format as written by “scalepack” (or my best approximation to it). If the UNMERGED option is also selected, then the output matches the scalepack “output nomerge original index”, otherwise it is the “normal” scalepack output, with either I, sigI or I+ sigI+, I-, sigI-, depending on the “anomalous” flag.
UNMERGEDOUT <output unmerged MTZ file name>¶
Give explicit filename for output unmerged MTZ file, if selected with the OUTPUT command
KEEP [OVERLOADS | BGRATIO <bgratio_max> | PKRATIO <pkratio_max> | GRADIENT <bg_gradient_max> | EDGE | MISFIT]¶
Set options to accept observations flagged as rejected by the FLAG column from Mosflm. By default, any observation with FLAG .ne. 0 is rejected. Flagged reflections which are accepted may be marked in the ROGUES file.
- Subkeys:
- OVERLOADS
Accept profile-fitted overloads
- BGRATIO
Observations are flagged in Mosflm if the ratio of rms background deviation relative to its expected value from counting statistics is too large. This option accepts observations if bgratio < bgratio_max [default in Mosflm 3.0]
- PKRATIO
Accept observations with peak fitting rms/sd ratio pkratio < pkratio_max [default maximum in Mosflm 3.5]. Only set for fully recorded observations
- GRADIENT
Accept observations with background gradient < bg_gradient_max [default in Mosflm 0.03].
- EDGE
Accept profile-fitted observations on edge of active area of detector
- MISFIT
Accept reflections flagged as MISFIT by XDS (in XDS_ASCII.HKL file), ie flagged as outliers in the CORRECT step
LINK [SURFACE] ALL | <run_2> TO <run_1>¶
run_2 will use the same SURFACE (SECONDARY or ABSORPTION) parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. The keyword ALL will be assumed if omitted.
For SECONDARY or ABSORPTION parameters, the default is to link runs which come from the same crystal as long as they have similar wavelengths. They should be UNLINKed if they are different.
UNLINK [SURFACE] ALL | <run_2> TO <run_1>¶
Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted
BINS [RESOLUTION] <Nsbins> INTENSITY <Nibins>¶
Define number of resolution and intensity bins for analysis [default 10]
SMOOTHING <subkeyword> <value>**NOT YET DONE**
Set smoothing factors (“variances” of weights). A larger “variance” leads to greater smoothing
- Subkeys:
- TIME <Vt>
smoothing of B-factors [default 0.5]
- ROTATION <Vz>
smoothing of scale along rotation [default 1.0]
- PROB_LIMIT <DelMax_t> <DelMax_z> <DelMax_xy>
maximum values of normalized squared deviation (del**2/V) to include a scale [default set automatically, typically 3]
NAME PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>¶
Assign or reassign project/crystal/dataset names, for output file. The names given here supersede those in the input file and redefines the single output dataset. Note that these names apply to all data: if multiple datasets are required, these must be specified in Pointless. DATASET must be present, and may optionally be given in the syntax crystal_name/dataset_name
- BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name> NOT YET DONE
If there are multiple datasets in the input file, define the “base” dataset for analysis of dispersive (isomorphous) differences. Differences between other datasets and the base dataset are analysed for correlation and ratios, ie for the i’th dataset (I(i) - I(base)). By default, the datasets with the shortest wavelength will be chosen as the base (or dataset 1 if wavelength is unknown). Typically, the CRYSTAL keyword may be omitted.
HKLIN <input file name>¶
Filename for the main input file, as an alternative to specifying it on the command line.
HKLOUT <output file name>¶
Filename for the output file, as an alternative to specifying it on the command line.
XMLOUT <output XML file name>¶
Filename for the XML output file, as an alternative to specifying it on the command line.
HKLREF <reference file name>¶
Filename for a reference reflection MTZ file, as an alternative to specifying it on the command line. This file is used to provide a “best” estimate of intensity, possibly for the option to refine against a reference set (see above). This reference set is also used to compare to the scaled observed data, analysing it for its agreement as a function of batch, as R-factors and correlation coefficients, so that particularly bad regions of data may be detected. Column labels may be specified with the LABREF command.
For refinement against reference data, this file should be merged measured intensities from a reference crystal, or possibly amplitudes (deprecated due to bias from the “truncate” procedure).
For analysis, this reference data could also for example be calculated from the best current model, eg the FC_ALL_LS column from Refmac. Amplitudes are squared to intensities, and intensities are scaled to the merged observations with a scale and a anisotropic temperature factor. This is an alternative to giving a coordinate file XYZIN from which structure factors will be calculated.
LABREF [F I =]<columnlabel>]¶
For an HKLREF file, this defines the column label for intensity or amplitude (which will be squared to an intensity). If this command is omitted, the first intensity column (or if no intensities, the first amplitude) will be used. The next column is assumed to contain the corresponding sigma. Note that provided the columns are contiguous, only the first of the set need be specified or chosen automatically eg LABREF I=I(+) will pick up I(+), SIGI(+), I(-), SIGI(-)
XYZIN <reference coordinate file name>¶
The filename for a reference coordinate set, for analysis, but not for refinement. Structure factors will be calculated to use as a reference, in the same way as HKLREF. This provides a current “best” estimate of intensity, and the observed data is analysed for its agreement as a function of batch, as R-factors and correlation coefficients, so that particularly bad regions of data may be detected. The file should contain a valid space group name (full name with spaces, eg “P 21 21 21”, “P 1 21 1” etc) and unit cell parameters (ie a CRYST1 line in PDB format).
ROGUES <rogues file name>¶
File name for rogues file, otherwise ROGUES or assigned on the command line
PLOT [NOXMGR | XMGR]¶
By default, various 2D plots (ROGUEPLOT, NORMPLOT, ANOMPLOT, CORRELPLOT) are written to separate files in the format for the plotting program xmgr (aka [xm]grace), but which can also be read by loggraph. The command PLOT NOXMGR will suppress these output files, since for example in the context of ccp4i2 they are not needed, as the plot information is written to the XMLOUT file.
CELL a b c alpha beta gamma¶
Set unit cell to override all cell dimensions read from the input files
BFACTOR FIRST [RUN <run number>]¶
After each refinement cycle, the relative B-factors are adjusted (‘normalised’) to make the ‘best’ (largest) = 0. This is usually the right thing to do, but may be overridden with this command, to use the first rotation range as the reference to make = 0, either in the 1st run or in the run specified here. Not generally recomended
INPUT AND OUTPUT FILES¶
Input¶
HKLIN The input file must be sorted on H K L M/ISYM BATCH (eg output from POINTLESS) Compulsory columns:
H K L indices
M/ISYM partial flag, symmetry number
BATCH batch number
I intensity (integrated intensity)
SIGI sd(intensity) (integrated intensity)
Optional columns:
- XDET YDET position on detector of this reflection:
these may be in any units (e.g. mm or pixels), but the range of values must be specified in the orientation data block for each batch.
- ROT rotation angle of this reflection (“Phi”). If
this column is absent, only SCALES BATCH is valid.
IPR intensity (profile-fitted intensity)
SIGIPR sd(intensity) (profile-fitted intensity)
- SCALE previously calculated scale factor (e.g. from
previous run of Scala). This will be applied on input
SIGSCALE sd(SCALE)
- TIME time for B-factor variation (if this is
missing, ROT is used instead)
MPART partial flag from Mosflm
FRACTIONCALC calculated fraction, required to SCALE PARTIALS
LP Lorentz/polarization correction (already applied)
- FLAG error flag (packed bits) from Mosflm (v6.2.3
or later). By default, if this column is present, observations with a non-zero FLAG will be omitted. They may be conditionally accepted using the KEEP command (qv)
- Bit flags:
1 - BGRATIO too large
2 - PKRATIO too large
4 - Negative > 5*sigma
8 - BG Gradient too high
16 - Profile fitted overload
32 - Profile fitted “edge” reflection
- BGPKRATIOS packed background & peak ratios, & background
gradient, from Mosflm, to go with FLAG
LATTNUM lattice number for multilattice data
Hn, Kn, Ln hkl indices for overlapped observations with multilattice data
HKLREF reference file for analysis of agreement by batch.
This may contain intensities or amplitudes (which will be squared), eg the FC_ALL_LS column from Refmac. The label is specified on the LABREF command XYZIN as an alternative to HKLREF, a coordinate file may be given, from which amplitudes and intensities will be calculated
Output¶
Reflection files output¶
In all cases, separate files are written for each dataset: files are named with the base HKLOUT name with the dataset name appended, as “_dataset”
HKLOUT: option OUTPUT [MTZ] MERGED
The output file contains columns
H K L IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-)
Note that there are no M/ISYM or BATCH columns. I(+) & I(-) are the means of the Bijvoet positive and negative reflections respectively and are always present even for the option ANOMALOUS OFF.
HKLOUTUNMERGED: option OUTPUT [MTZ] UNMERGED [REDUCED | ORIGINAL]
Unmerged data with scales applied, with no partials ( i.e. partials have been summed or scaled, unmatched partials removed), & outliers rejected. Only a single scaled intensity value is written, chosen as summation, profile-fitted or combined as specified by the INTENSITIES command. Columns defining the diffraction geometry ( e.g. FRACTIONCALC XDET YDET ROT TIME WIDTH LP) will be preserved in the output file. If HKLOUTUNMERGED or UNMERGEDOUT is not specified, then the filename for the unmerged file has “_unmerged” appended to HKLOUT.
Output columns:
H,K,L REDUCED or ORIGINAL indices (see OUTPUT options)
M/ISYM Symmetry number
BATCH batch number as for input
I, SIGI scaled intensity & sd(I)
SCALEUSED scale factor applied
SIGSCALEUSED sd(SCALE applied)
- NPART number of parts, = 1 for fulls, negated for scaled
partials, i.e. = -1 for scaled single part partial
FRACTIONCALC total fraction (if present in input file)
TIME copied from input if present
XDET,YDET copied from input if present
ROT copied from input if present (averaged for multi-part partials)
WIDTH copied from input if present
LP copied from input if present
SCALEPACK: option OUTPUT SCALEPACK MERGED
If a SCALEPACK filename is not specified then the filename will be taken from HKLOUT with the extension “.sca”
SCALEPACKUNMERGED: option OUTPUT SCALEPACK UNMERGED
If a SCALEPACKUNMERGED filename is not specified then the filename will be taken from SCALEPACK with “_unmerged” appended and the extension “.sca”
Other output files¶
- XMLOUT
XML output for plotting etc. It includes the NORMPLOT, ANOMPLOT, CORRELPLOT and ROGUEPLOT data, as well as the $TABLE graph data
- SCALES
scale factors from DUMP, used by RESTORE option
- ROGUES
list of bad agreements
- TILEIMAGE
a detector image representing the CCD TILE correction, if activated, in ADSC image format which may be viewed with adxv
The following 4 files are also represented in the XMLOUT file:
- NORMPLOT
normal probability plot from merge stage ** this is at present written is a format for plotting program xmgr (aka [xm]grace), but can also be read by loggraph **
- ANOMPLOT
normal probability plot of anomalous differences
(I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]
** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph **
- CORRELPLOT
scatter plot of pairs of anomalous differences (in multiples of RMS) from random half-datasets. One of these files is generated for each output dataset ** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph **
- ROGUEPLOT
a plot of the position on the detector (on an ideal virtual detector with the rotation axis horizontal) of rejected outliers, with the position of the principle ice rings shown ** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph **
REFERENCES¶
P.R. Evans and ,G.N. Murshudov “How good are my data and what is the resolution?” Acta Cryst. (2013). D69, 12041214
P.R.Evans “An introduction to data reduction: space-group determination, scaling and intensity statistics”, Acta Cryst. D67, 282-292 (2011)
P.R.Evans, “Scaling and assessment of data quality”, Acta Cryst. D62, 72-82 (2006). Note that definitions of R meas and R pim in this paper are missing a square-root on the (1/n-1) factor
Kabsch, J.Appl.Cryst. 21, 916-924 (1988)
P.R.Evans, “Data reduction”, Proceedings of CCP4 Study Weekend, 1993, on Data Collection & Processing, pages 114-122
P.R.Evans, “Scaling of MAD Data”, Proceedings of CCP4 Study Weekend, 1997, on Recent Advances in Phasing, Click here
R.Read, “Outlier rejection”, Proceedings of CCP4 Study Weekend, 1999, on Data Collection & Processing
Hamilton, Rollett & Sparks, Acta Cryst. 18, 129-130 (1965)
Blessing, R.H., Acta Cryst. A51, 33-38 (1995)
Kay Diederichs & P. Andrew Karplus, “Improved R-factors for diffraction data analysis in macromolecular crystallography”, Nature Structural Biology, 4, 269-275 (1997)
Manfred Weiss & Rolf Hilgenfeld, “On the use of the merging R factor as a quality indicator for X-ray data”, J.Appl.Cryst. 30, 203-205 (1997)
Manfred Weiss, “Global Indicators of X-ray data quality” J.Appl.Cryst. 34, 130-135 (2001)
Greta Assmann, Wolfgang Brehm, & Kay Diederichs, “Identification of rogue datasets in serial crystallography”, J.Appl.Cryst, 49, 1021-1028 (2016)
Appendix 1: Partially recorded reflections¶
In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part ( e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 301, 302, 303), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.
Summed partials: All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The parameters for the checks are set by the PARTIALS command. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.
if the CHECK option is set (the default if an MPART column is present), the MPART flags are examined. If they are consistent, the summed intensity is accepted. If they are inconsistent (quite common), the total fraction is checked (TEST). NOCHECK switches off this check.
if the TEST option is set (default), the summed reflection is accepted if the total fraction (the sum of the FRACTIONCALC values) lies between <lower_limit> -> <upper_limit> [default limits = 0.95 1.05]
if the CORRECT option is set, the total intensity is scaled by the inverse total fraction for total fractions between <minimum_fraction> to <lower_limit>. This works also for a single unmatched partial. This correction relies on accurate FRACTIONCALC values, so beware.
if the GAP option is set (not recommended), partials with a gap in are accepted, e.g. a partial over 3 parts with the middle one missing. The GAP option implies TEST & NOCHECK, & the CORRECT option may also be set.
By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .
TEST 1.2 1.2 CORRECT 0.5
will scale up all partials with a total fraction between 0.5 & 1.2
TEST 0.95 1.05
will accept summed partials 0.95->1.05, no scaling
TEST 0.95 1.05 CORRECT 0.4
will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95
Appendix 2: Scaling algorithm¶
For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing
Sum( whl * ( Ihl - ghl * Ih )**2 ) Ref Hamilton, Rollett & Sparks
where Ih is the current best estimate of the “true” intensity
Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)
An alternative method scales to an external previously-determined reference dataset, minimising
Sum( whl * ( Ihl - ghl * Ihref )**2 )
where Ihref is the reference intensity and the weight whl = 1/(var(Ihl) + var(Ihref))
Each observation is assigned to a “run”, which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.
The inverse scale factor ghl is derived as follows:
ghl = Thl * Chl * Shl
where Thl is an optional relative B-factor contribution, Chl is a scale factor, and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY, ABSORPTION options).
a) B-factor (optional)
For each run, a relative B-factor (Bi) is determined at intervals in “time” (“time” is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . . tn). Then for an observation measured at time tl
B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))
where Bi are the B-factors at time ti
delt = tl - ti
p(delt) = exp ( - (delt)**2 / Vt )
Vt is "variance" of weight, & controls the smoothness
of interpolation
Thl = exp ( + 2 s B )
s = (sin theta / lambda)**2
b) Scale factors
For each run, scale factors Cz are determined at intervals on rotation angle z. Then for an observation at position (z0),
Chl(z0) =
Sum(z)[p(delz)*Cz]/Sum(z)[p(delz)]
where delz = z - z0
p(delz) = exp(-delz**2/Vz)
Vz is the "variance" of the weight & controls the smoothness of interpolation
For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor for each batch.
c) Anisotropy factor
The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of (1) the secondary beam (SECONDARY correction) in the camera spindle frame, (2) the secondary beam (ABSORPTION correction) in the crystal frame, permuted to put either a*, b* or c* along the spherical polar axis
SECONDARY beam direction (camera frame)
s = [Phi] [UB] h s2 = s - s0 s2' = [-Phi] s2 Polar coordinates: s2' = (x y z) PolarTheta = arctan(sqrt(x**2 + y**2)/z) PolarPhi = arctan(y/x) where [Phi] is the spindle rotation matrix [-Phi] is its inverse [UB] is the setting matrix h = (h k l)
ABSORPTION: Secondary beam direction (permuted crystal frame)
s = [Phi] [UB] h s2 = s - s0 s2c' = [-Q] [-U] [-Phi] s2 Polar coordinates: s2' = (x y z) PolarTheta = arctan(sqrt(x**2 + y**2)/z) PolarPhi = arctan(y/x) where [Phi] is the spindle rotation matrix [-Phi] is its inverse [Q] is a permutation matrix to put h, k, or l along z (see POLE option) [U] is the orientation matrix [B] is the orthogonalization matrix h = (h k l)
then
Shl = 1 + Sum[l=1,lmax] Sum[m=-l,+l] Clm Ylm(PolarTheta,PolarPhi)
where Ylm is the spherical harmonic function for
the direction given by the polar angles
Clm are the coefficients determined by
the program
Notes:
The initial term “1” is essentially the l = 0 term, but with a fixed coefficient.
The number of terms = (lmax + 1)**2 - 1
Even terms (ie l even) are centrosymmetric, odd terms antisymmetric
Restraining all terms to zero (with the TIE SURFACE) reduces the anisotropic correction. This should always be done
Detector correction (TILES)
A correction for tiled CCD detectors has been implemented to attempt to correct for the underestimation of spots falling in the corner of the detector. The present model expresses a correction factor in terms of an erfc function of the distance from the tile centre, such that the correction = 1 in the centre of the tile and falls off at the edge and corners
For a spot at position x,y relative to the tile centre, normalised by the tile width in pixels such that x & y run from -1 to +1, then distance from centre (x0,y0) d = sqrt[(x-x0)2 + (y-y0) 2] correction factor g = A f(z) + 1 - A where A is the amplitude of the correction near the edge and f(z) is a radial function of the modified “radius” z = (2/w)(d - r - w) . r defines the point at which the scale starts to decline from 1.0, and w the “width” of the fall-off Currently f(z) = 0.5 erfx(z) though other expressions have been tried
Amplitude A various azimuthally with the angle phi = tan-1(y/x) as a Fourier series, A = A0{a cos(phi) + b sin(phi) + c cos(2phi) + d sin(2phi)} Refined parameters for each tile are r, w, A0, x0, y0, and the four Fourier terms for A, a,b,c,d. By default, parameters are restrained (TIE) as follows (see TIE TILE)
A0, a,b,c,d and x0,y0 are tied to 0.0 with their SDs r, w are tied to target values with their SDs [default 0.70, 0.40] r, w, and A0 are tied to be similar over all tiles Five SD values control the strength of the restraints, respectively for r, w, A0, x0y0, and abcd SD = 0 switches off the restraint
Appendix 3: Data from Denzo¶
DENZO is often run refining the cell and orientation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from scalepack as input to Aimless. The DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using COMBAT (see COMBAT documentation) or POINTLESS (for Scalepack output only).
Both of these options have some problems
If you take the output from Denzo into Scala, there may be problems with partially recorded reflections: it is difficult for Scala to determine reliably that it has all parts of a partial to sum together.
If you take unmerged output from scalepack into Aimless, most of the geometrical information about how the observations were collected is lost, so many of the scaling options in Aimless are not available. Only Batch scaling can be used, but simultaneous scaling of several wavelengths or derivatives may still be useful
Appendix 4: Outlier algorithm¶
The test for outliers is as follows:
if there are 2 observations (left), then
for each observation Ihl, test deviation
Delta(hl) = (Ihl - ghl Iother) / sqrt[sigIhl**2 + (ghl*sdIother)**2]
against sdrej2, where Iother = the other observation
b. if either \Delta(hl)\ > sdrej2, then
in scaling, reject reflection. Or:
in merging,
keep both (default or if KEEP subkey given) or
reject both (subkey REJECT) or
reject larger (subkey LARGER) or
reject smaller (subkey SMALLER).
if there 3 or more observations left, then
for each observation Ihl,
calculate weighted mean of all other observations <I>n-1 & its sd(<I>n-1)
deviation
find largest deviation max Delta(hl)
count number of observations for which Delta(hl) .ge. 0 (ngt), & for which Delta(hl) .lt. 0 (nlt)
if max Delta(hl) > sdrej, then reject one observation, but which one?
if ngt == 1 .or. nlt == 1, then one observation is a long way from the others, and this one is rejected
else reject the one with the worst deviation max Delta(hl)
iterate from beginning
RELEASE NOTES¶
0.8.2 Made ‘normalisation’ of relative Bfactors more robust by excluding phi ranges with relatively few observations from the choice of the ‘best’ range (ie the one with the largest B, as Bs are mostly negative). The simple choice may be bad if there is a large gap in the run data (only with explicit run definition)
0.8.1 Bug fix: normalisation was wrong for low resolution data, > 3.0A, failed on Linux, apparently harmless on Mac. Ice ring options: alternative ring list, options to reject from final output, just for normalisation, or no ice ring test
0.7.15 corrected reflection counts for case of run resolution limits without overall limit
0.7.14 analysis of multiple datasets against reference, if given
0.7.13 cosmetic change to cross-dataset graphs
0.7.12 anisotropic normalisation for Emax test (by default), revised (simplified) Emax test, improvements for previously merged data, correlation of intensities between datasets
0.7.11 allow change of weighting in outlier rejection (REJECT WEIGHT) and echo to logfile and XML
0.7.9 make sure that outer XML block is closed by destructor of Output object, even after a crash. Change some operator definitions in scala_util from returning a reference
0.7.8 improved treatment of (slightly) different cells for different crystals or datasets. Resolution cutoffs by run use the appropriate cell, but overall cutoff uses average cell. Optional unmerged output files preserve the distinct unit cells. Added CELL command to override all cells
0.7.7 ONLYMERGE now switches off SDcorrection and outlier rejection by default, but honours explicit settings.
0.7.6 Added PLOT NOXMR to suppress xmgr files. Fix for occasional ISYM error in unmerged output. Unmerged OUTPUT ORIGINAL option for original hkl, which can now be read back in (also to Pointless). UNMERGEDOUT command to give name for unmerged output. Record output options and filenames to XML.
0.7.5 small fixes to bring in line with Pointless
0.7.4 More robust normalisation for Emax test. Keep Emax outliers if most observations of a reflection are large. Normalisation still needs more work.
0.7.3 Make it work when excluding 1st of multiple datasets. Make initial scaling more robust by using weighted mean(I). Do Emax test before outlier test. More robust normalisation; Emax rejection also tests I/sd(I) to avoid rejecting weak data in shells with small <I>.
0.7.2 small fix to SD analysis table
0.7.1 allow gaps in data if explicit runs are given (unless you give
“initial minimum_overlap > 0.0”). Improved analysis of secondary corrections
0.7.0 REFINE REFERENCE option, for scaling to external reference intensities
0.6.4 monitor deviant but kept observations in ROGUES file
0.6.3 trap RESTORE with one parameter, turn on ONLYMERGE
0.6.2 Bug fix for only one batch (eg from merged data)
0.6.1 Group batches for analysis, see ANALYSIS GROUPBATCH. Improve SDCORRECTION SAMPLE, sample variances: compare individual propagated SDs with sample SDs. Limit Emax test to “reliable” resolution range (not very weak high resolution ranges, needs further improvement). Many variables converted from float to double. Chi^2 statistic against intensity, resolution and batch. Cumulative CC(1/2) vs. batch. Do the analysis even if scaling can’t be done due to insufficient information in input reflection file (usually data that is already scaled eg in XSCALE)
0.5.29 trap case of insufficient data with one rotation range, make dump/restore work for that case
0.5.28 small format change in RunPairs to avoid column coalescence
0.5.27 More accurate averaging of wavelengths
0.5.26 REJECT EMAX 0.0 switches off Emax test. Big speed-up in SF calculation (cf Pointless)
0.5.25 fix bug introduced in 0.5.24 which removed from the merged file I+ & I- if either were negative
0.5.24 fix bug if different resolutions for different runs. Small bug fix in secondary beam scaling. Switch off secondary scaling in first pass, stabilises the scaling in some cases. Trap negative scales (shouldn’t happen). Fixes to sample SDs (for high multiplicity), compare sample SD to propagated SD. Output secondary beam corrections to logfile and XML. Reset SdFac after first round scaling. Default SDCORRECTION SAME (instead of INDIVIDUAL)
0.5.22, 23 Improved (ie corrected from version 0.5.18) detection of parts of data which do not have any scaling overlaps with other rotation ranges, see INITIAL MINIMUM_OVERLAP & MAXIMUM_GAP. Some changes to resolution limit determination
0.5.21 if XMLOUT is assigned on the command line, open it early so that syntax errors get added
0.5.19 add resolution limit at I/sd > 2 for Frank von Delft
0.5.18 Some changes to improve robustness for low multiplicity, mainly for small molecule data. No scaling if multiplicity is too low. Added INITIAL MINIMUM_MULTIPLICITY option. Changes to choice of observations for scaling and SD optimisation. Trap negative secondary scales. Correct derivatives in TIEs (doesn’t make much difference)
0.5.17 Bug fix to error and warning printing
0.5.16 improved the robustness of SDcorrection refinement with a few fulls. REJECT NONE option. Option to accept XDS “misfits” (outliers) (KEEP MISFIT)
0.5.15 bug fix in run pair correlations. Fix to XML for multiple datasets
0.5.14 bug fix in Rmerge(batch), was double counting. Change to expect stdin input unless “–no-input” given on command line. Added “descriptions” to graphs, for ccp4i2.
0.5.13 fix memory leak in resolution calculation
0.5.10,11,12 bug fix in setting spherical harmonic orders. Keep empty batches in unmerged file output. Bug fix in scaling one lattice from multilattice data
0.5.9 bug fix to allow ABSORPTION <lmax> to work. Fix for restore problem with variances & tiles
0.5.8 bug fix for case SCALES CONSTANT BROTATION with one run (not sensible anyway). Also fixed bug when there are different resolution limits for different datasets
0.5.7 minor bug fix to resolution tables
0.5.6 bug fix for SCALE CONSTANT with more than one run
0.5.5 bug fix in radiation damage analysis. Rescale Scalepack output if intensities are small
0.5.4 bug fix for Sca output with one of I+ or I- missing
0.5.3 fix save/restore bug for BFACTOR OFF
0.5.2 bug fix for already merged data. Fix long-standing rare bug in hash table
0.5.1 “improved” SD correction refinement. Added [UN]LINK commands, improved default linking
0.4.10 add SD analysis graph to XML
0.4.8,9 improved robustness of maximum resolution curve fit
0.4.7 better trap for no data in SDCORRECTION refinement
0.4.5,6 fill in missing IPR columns from I, shouldn’t normally happen
0.4.2,3,4 Bug fixes. Unmerged SCA files written with corrected symmetry translations
0.4.1 Inflate sd(I) using estimated parameter errors from inverse normal matrix (see USESDPARAMETER). Fixed bug in TILE correction. Added curve fit for maximum resolution estimation
0.3.11 Fixed nasty bug from XDS->Pointless giving Assertion failed: (sd > 0.0), function Average
0.3.10 Bug fixes. Also restrict run-run correlations to < 200 runs.
0.3.9 Added matrix of run-run cross-correlations
0.3.8 bug fixes to make EXCLUDE BATCH <range> option work
0.3.7 Bug fixes for unusual case of runs with all fulls and no fulls. Options for XFEL data: SDCORRECTION SAMPLESD; Rsplit. Bug fixes for Batch scaling with rejected batches. REJECT BATCH option for batch scaling
0.3.6 Fix bug with explicit RUN definitions.
0.3.4,5 remove debug print for self-overlaps. Pick up number of parts for previously summed partials (MPART column from Feckless), for partial bias analysis
0.3.3 fixed save/restore for TILE correction. Fixed reading of SDcorrection parameters
0.3.2 more corrections to multilattice handling (mapping lattice number to run number for scaling)
0.3.1 optional reference data for analysis of agreement by batch, either as structure factors (or intensities) HKLREF,LABREF, or coordinates (XYZIN)
0.2.20 fix bug for single B-factor/run
0.2.18,19 updates from Pointless for multiple lattices. Corrected calculation of anomalous multiplicity
0.2.17 fix bug in setting same resolution bin widths for multiple datasets when NBINS is set
0.2.16 message for std::bad_alloc, running out of memory
0.2.15 fix to XML graphs (for ccp4i2)
0.2.14 fix to correctly append to MTZ history
0.2.13 activate writing spacegroup confidence. Reflection status flags cleared before outlier checks
0.2.12 fix bug in reading multilattice files
0.2.10 small bug fix in radiation damage analysis
0.2.9 fix for Batch scaling if no phi range information
0.2.8 XML changes for I2 report. Change automatic anomalous thresholds, always output anom statistics
0.2.7 Bug fix in XML if no orientation data (ROGUEPLOT)
0.2.6 Fix to output multilattice overlaps. Added radiation damage analysis as in CHEF, for Graeme Winter
0.2.5 Fix so that BINS RESOLUTION works
0.2.4 Bug for XDS data, was omitting reflections with FRACTIONCALC (derived from IPEAK) < 0.95, leading to incompleteness
0.2.3 Now does reject and record Emax outliers properly (though work is continuing on improving this). Fixed small bug in analyseoverlaps.
0.2.2 fixed bug in Bdecay plot when batches omitted. Explicit Xrange for XML batch plots. No ROGUEPLOT if no orientation data. List overlaps in ROGUES file
0.2.1 some major reorganisations. Added XML output. SCALES TILE option. Handling of multilattice data. SDCORRECTION SIMILAR
0.1.30 allow TIE with negative sd to turn off tie, as documented. Also fixed bug in ABSORPTION
0.1.29 small change to Result table to work with Baubles arcane (and undocumented) rules for Magic Tables
0.1.28 bug fix in “sdcorrection same”
0.1.27 bug fix in minimizer which sometimes affected the case with just 2 parameters
0.1.26 Default to “scales secondary”
0.1.25 omit sigI<=0, process REJECT command properly, small bug fix in smoothed Bfactors
0.1.24 small bug fix in printing batch tables with multiple datasets
0.1.22,23 INITIAL UNITY option. In tables, print batches with no observations but not rejected batches. Put title into output file. Fix initial scale bug with 3 scales
0.1.21 corrections to ROGUEPLOT, ice rings were in wrong place (by a factor of wavelength)
0.1.20 made sdcorrection refinement more robust to low multiplicity. If anomalous off (or no anomalous detected), statistics are now printed over all I+ I- together. Reject large negative observations (default E < -5)
0.1.19 preliminary addition of spg_confience(\ status). Bug fix from valgrind (from Marcin)
0.1.18 changed tablegraph to fix compilation problem (va_start)
0.1.17 bug fix in outlier rejection, problem with large variances leading to inconsistencies in Rogues file and some over-rejection
0.1.16 made SDcorrection refinement more robust
0.1.14,15 various bug fixes (including memory leaks), fixed autorun generation, improved SD correction for large anomalous, constrain cell to lattice group, etc
0.1.12 Half-dataset CC labelled as “CC(1/2)”
0.1.11 Small bug fixes
0.1.9 autodetect anomalous. Plot Rmeas for each run
0.1.7 fix for SCALES CONSTANT from XSCALE
0.1.6 anisotropy analysis against planes in trigonal, hexagonal and tetragonal systems (inlcuding rhombohedral axes), principal anisotropic axes in monoclinic and triclinic, cone analyses weighted according to cos(AngleFromPrincipalDirection). Fixed cases where multiple datasets have different resolution limits
0.1.4,5 more fixes for multiple datasets, dump/restore. OUTPUT UNMERGED SPLIT is default
0.1.3 More “resolution run”bug fixes
0.1.2 REFINE PARALLEL option (thanks to Ronan Keegan). Fixed bug in “resolution run” options
0.1.1 fixed bugs in writing ROGUES file; introduced HKLOUTUNMERGED etc filename specifiers; cleaned up Unmerged output; added Rfull to tables
0.1.0 fixed some bugs found by cppcheck and valgrind
0.0.16 fixed small bug in INTENSITIES COMBINE optimisation
0.0.15 if run definitions are given explicitly, then unspecified batches are excluded
0.0.14 Added optimisation for INTENSITIES COMBINE, for Mosflm data. This is now the default