dm HKLIN foo.mtz HKLOUT bar.mtz [ SOLIN
foo.msk ] [ SOLOUT bar.msk ] [ NCSIN1 foo1.msk
[ NCSIN2 ... ] ] [ NCSOUT foobar.msk ] [ VUOUT foobar.vu ]
`dm' is a package which applies real space constraints based on known features of a protein electron density map in order to improve the approximate phasing obtained from experimental sources. Various information can be applied, including such diverse elements as the following (see the MODE keyword):
The program has many phase extension schemes and phase weighting/combination modes, which are selected by the appropriate choice of keywords. The combination mode is determined by the COMBINE keyword, if this keyword is omitted then the program runs in perturbation-gamma mode.
Calculation of scale and B-factor for the data are automatic. This is performed by comparison with an empirically derived database of map variance at different resolutions, and is more reliable than the conventional Wilson plot.
Non-crystallographic symmetry averaging can be performed for both proper and improper symmetries, and different NCS averaging operations can be applied to different parts of the protein. (Thanks to Dave Schuller for his help with this). Input masks may be on any grid and axis order. In the case of a single averaging domain, if no averaging mask is input then a mask can be generated automatically.
Skeletonisation is by the core-tracing algorithm of Swanson (reference ). This is faster than Greer's algorithm and allows adjustment of the skeletonisation parameters without recalculating the skeleton. As a result the skeletonisation calculation is rendered largely automatic.
As a starting point, I used the following recipes. If averaging is available, use it and run for at least 10 cycles:
SOLC <solc> MODE SOLV HIST MULT AVER COMBINE PERT NCYCLE 20 AVER REFI ... LABIN ... LABOUT ...
For averaging calculations where there is a great deal of phase extension to be performed (e.g. from 6.0Å to 2.5Å), use more cycles and specify a phase extension scheme:
SOLC <solc> MODE SOLV HIST AVER COMBINE PERT SCHEME RES FROM 6.0 NCYCLE 200 AVER REFI ... LABIN ... LABOUT ...
If averaging is not available, you may want to use NCYCLE AUTO to prevent the phase bias and overweighting. Alternatively, you can run the calculation for more cycles, but be aware that the FOMs will be badly overestimated:
SOLC <solc> MODE SOLV HIST MULT COMBINE PERT NCYCLE AUTO LABIN ... LABOUT ...
HISTogram matching should ALWAYS be used. MULTi-resolution modification is new, but also worth using.
There are two Free indicators that `dm' can use. The first is the density modification Free-R (defined in the same way as the refinement Free-R). This is calculated in the Free-Sim and Omit modes. Unfortunately, while effective for refinement, it is a poor indicator of the progress of density modification, however it can be used in many cases to identify the correct enantiomorph. A better indicator (due to J. P. Abrahams) is the real-space-free-residual. This is calculated by omitting two small spheres of protein and solvent from the density modification. The flatness of the solvent sphere and the histogram fit in the protein sphere provide a better indication of progress.
Input mtz file - This should contain the conventional (CCP4) asymmetric unit of data (see CAD).
Output mtz file.
Input solvent mask - This overrides the automatic Wang mask determination. The input mask can have any grid and axis ordering, and may have any extent from the protein region of a single asymmetric unit to the whole cell.
Alternatively, a map may be input on the SOLIN channel. In this map any grid points set to 1.0 are considered protein, grid points set to 0.0 are solvent, and grid points set to -1.0 are considered to be neither. By constructing an appropriate input mask it is possible to perform solvent flattening and histogram matching without suppressing any heavy atom density.
Output solvent mask - This will be on the program grid with default axis order, and will cover the whole unit cell.
Input NCS averaging masks - These are used with the AVER option. The input masks can have any grid or axis ordering, and may cover a single monomer or the whole multimer.
If an NCS averaging mask is not input, the program will compute it with an automatic procedure, when there is only one domain involved. Auto-NCS masking depends on knowing how many monomers form a closed symmetry group. This can be specified with the NCSMASK NMER keywords, or the program will attempt to estimate it for simple cases. If you do not supply a value, check the value the program estimates carefully.
If the averaging mask is calculated automatically, or is being refined (NCSMASK UPDATE keyword) it may be output in this file.
When Non-Crystallographic Symmetry is present, its symmetry elements, i.e. axis and points, can be visualised using XtalView or O. If the keyword VUOUT is followed by a ``.vu'' file, the program writes out a file that can be used in XtalView to view the NCS elements. If the keyword is followed by a ``.o'' file the output can be visualised using the program O. Default is to .vu files.
SHARP is probably the best source of phasing for density modification. However, if you wish to run dm after SHARP you should first turn off the Solomon option.
Solomon produces excellent maps, and so often having run SHARP and Solomon you will not want to use dm at all. However, Solomon produces badly overestimated FOM's (typically 0.9 - 0.95), which while they do not damage the maps, effectively cripple any further density modification (or for that matter any phased maximum-likelihood refinement calculation).
(Since the FOM is based on an estimate of the error in the phases, a high FOM implies that the phases are correct and therefore should not be modified by any subsequent procedure. Thus further density modification will hardly change the maps, and ML-refinement will be badly biased by the errors in the starting phases.)
Input is keyworded. Available keywords are: AVERAGE, COMBINE, GRID, LABIN, LABOUT, MODE, NCSMASK, NCYCLE, REALFREE, RESOLUTION, SCALE, SCHEME, SKEL, SOLC, SOLMASK.
In addition, the following optional keywords control the data harvesting functionality: PNAME, DNAME, PRIVATE, USECWD, RSIZE, NOHARVEST
(SOLC and MODE are compulsory)
Select the calculation to be performed:
Number of cycles of phase extension to perform.
(default: ALL, or AUTO for COMBINE FREE)
LABIN FP=.. SIGFP=.. [PHIO=.. FOMO=..] [HLA=.. HLB=.. HLC=.. HLD=..] [FDM=..] [PHIDM=..] [FOMDM=..] [FREE=..]
Normally just the first four columns (FP,SIGFP,PHIO,FOMO) are input. However if you have Hendrickson-Lattman coefficients you may want to input these to the program as well (the difference is marginal except for SIR data). If you want to start from the end of a previous density modification calculation then the PHIDM, FOMDM columns are used.
Three columns are output by default, a magnitude, phase and figure-of-merit. Normally a map would be calculated using FDM and PHIDM (do not include the weight FOMDM). Alternatively a weighted map with FP, PHIDM, FOMDM should give the same result, except that restored magnitudes will be missing.
Set a NCS symmetry averaging operator. This card is followed by rotation/translation matrices on subsequent lines in either CCP4 or O/RAVE format.
These are the operations which map the density in the region covered by the input mask onto the other equivalent regions. The first operator must be the identity matrix. The mask is input in CCP4 mask format on the input file label NCSIN1. In the case of improper ncs, the mask must cover just a monomer, for proper ncs it may cover the monomer or multimer. The mask grid need not agree with the program grid.
If no input mask is assigned then 'dm' will attempt to generate one automatically from the local density correlation, under the control of the NCSMASK keyword. Always check the mask afterwards in an appropriate graphics program (use the NCSOUT channel).
If you want to apply different NCS operations to different domains of the protein, give a set of AVER cards for each DOMAIN, with the DOMAIN number on each AVER card (or the first for each domain). An input mask is also required for each domain. The AVER DOMAIN 1 cards corresponds to the mask on NCSIN1, the AVER DOMAIN 2 to NCSIN2, etc. The masks should be defined in the same multimer in the unit cell, or at least in close proximity to one another.
The REF, STEP and EVERY cards will enable refinement of the NCS rotation matrices between averaging cycles. The REF card enables the refinement of a particular set of NCS parameters. Note that the STEP card allows different refinement step sizes can be used for different domains, however all but one EVERY card will be ignored. The refined matrices will be written out at the end of the log file.
NCSMASK [OVERLAP] [INVERT] [NMER <nmer>] [UPDATE <cyc>] [STEP <step>] [ALIM <u1> <u2>] [BMIN <v1> <v2>] [CLIM <w1> <w2>] [SIZE <size>] [BFAC <bfac>]
Control ncs-matrix, masking, and auto-masking behaviour. The OVERLAP card forces overlap removal for all NCS-masks. This was the default mode of operation for old versions of `dm' which did not support multimer masks; it must not be used if the NCS-mask covers a more than one monomer. Note that the ncs-correlation statistics may be less reliable when using a multimer mask.
The NMER, STEP, ALIM, BLIM, CLIM, SIZE and BFAC cards control ncs-auto-masking if no averaging mask is input.
Defaults: <nmer>=1, <step>=3, <size>=1, <bfac>=20.
SOLMASK [ UPDATE <cyc> ] [ FRAC <solvfrac> <protfrac> ] [ RADIUS <radius> <mode> ] [ LIMITS <rhomin> <rhomax> ]
Set parameters for calculation of the solvent mask.
Heavy atoms can bias the mask calculation procedure, resulting in a mask of spheres around the heavy atom sites. The LIMITS card can be used to set the values at which the electron density is truncated before smoothing. To truncate heavy atoms set <rhomax> to the maximum electron density due to non-heavy atoms at the appropriate resolution.
If a negative Wang radius is given, then the program will determine
a suitable radius from the data. This radius will decrease as the calculation
Resolution range of reflections to include in the calculation. This
keyword can be used to exclude part of the input data by resolution
cutoffs. This is generally highly inadvisable.
Perform iterative skeletonisation on the map. Cycles of skeletonisation are interspersed with cycles of conventional density modification.
Set the grid for the calculation. You may want to do this if you want
to output a map or mask.
Override internal scaling and scale input data by F^2 = <scale> * exp (<bfac> * s / 2.0) * F^2. Scaling is critical to histogram mapping and Sayre's equation. In some cases you may want to override the B-factor, but run without this card first, and consider long and hard before changing scale.
Enable the real-free residual (implied by NCYCLE AUTO). Optionally
set the coordinates and radii (in Angstrom) of the spherical patches
of density where the density modification constraints will be omitted
in order to provide a real-space free indicator of progress. If
<sr> or <pr> is negative the Solvent or Protein free
indicator will be omitted.
Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to
The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory. When running the program through the CCP4 interface, the $HARVESTHOME variable defaults to the 'PROJECT' directory.
Project Name. In most cases, this will be inherited from the MTZ file.
Dataset Name. In most cases, this will be inherited from the MTZ file.
Set the directory permissions to '700', i.e. read/write/execute for the user only (default '755').
Write the deposit file to the current directory, rather than a subdirectory of $HARVESTHOME. This can be used to send deposit files from speculative runs to the local directory rather than the official project directory, or can be used when the program is being run on a machine without access to the directory $HARVESTHOME.
Maximum width of a row in the deposit file (default 80). <row_length> should be between 80 and 132 characters.
Do not write out a deposit file; default is to do so provided Project and Dataset names are available.
Two free indicators may be generated. The density modification Free-R is calculated in COMBINE OMIT mode. This is a weak indicator (in no way comparable to the refinement free-R) which can be helpful in identifying the correct enantiomorph, but is inadequate for choosing density modification calculations.
The NCYCLE AUTO or REALFREE keywords enable calculation of the real-space free residual, which provide some information when used in conjunction with SCHEME ALL.
The LogGraph output, as well as showing the free-R factor, gives some information on the quality and completeness of the input data, and also a plot of the data fit against a standard protein data set.
For NCS-averaging calculations, correlations are calculated between related areas of density. These are summarised at the end of the log file, and error or warning messages will be generated if the initial values are too low: this is a good indication of errors in the input matrices or mask.
Also check the statistics of the averaging mask. If using a monomer mask, the masked fraction or the cell multiplied by the number of monomers/ASU multiplied by the order of crystallographic symmetry should give the protein faction. In the case of a multimer mask, this is reduced by the size of the multimer.
Kevin D. Cowtan, Department of Chemistry, University of York
cad, lsqkab, xloggraph, dm_skeletonisation, dm_ncs_averaging
# #[ a simple solvent/histogram calculation ] # dm hklin gmto.mtz hklout gmtodm.mtz << my-data SOLC 0.35 MODE SOLV HIST COMBINE PERT NCYCLE 10 LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM LABOUT PHIDM=PHI1 FOMDM=W1 my-data # #[ a better solvent/histogram/multires calculation,] #[ uses NCYCLE AUTO to terminate before phase bias ] #[ sets in, bias reduction using perturbation-gamma] # dm hklin gmto.mtz hklout gmtodm.mtz << my-data SOLC 0.35 MODE SOLV HIST MULT COMBINE PERT NCYCLE AUTO SCHEME ALL LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM HLA=HLA HLB=HLB HLC=HLC HLD=HLD LABOUT PHIDM=PHI1 FOMDM=W1 my-data # #[ a molecular replacement type calculation] #[ sigmaa is used to generate FOMs for the ] #[ Fcalc, PHIcalc ] # sigmaa hklin gmto_sfall.mtz hklout gmto_sigmaa.mtz << eof partial labin FP=FP SIGFP=SIGFP FC=FCmolr PHIC=PHICmolr eof dm hklin gmto_sigmaa.mtz hklout gmtodm.mtz << my-data SOLC 0.35 MODE SOLV HIST COMBINE PERT NCYCLE 10 LABIN FP=FP SIGFP=SIGFP PHIO=PHICmolr FOMO=WCMB my-data # # NON-CRYSTALLOGRAPHIC SYMMETRY AVERAGING #[ A three fold averaging calculation ] #[ This could also be done in reflection ] #[ omit mode if you have enough time ] # dm hklin chmimir.mtz hklout dmchm.mtz ncsin1 chmi.msk << MY-DATA SOLC 0.52 NCYC 10 MODE SOLV HIST AVER COMBINE PERT AVER REFI ROTA POLAR 0.0 0.0 0.0 TRANS 0.0 0.0 0.0 AVER REFI ROTA POLAR 113.28130 103.41944 120.33858 TRANS 43.635 38.059 62.726 AVER REFI ROTA POLAR 66.58067 -76.78019 119.69176 TRANS 82.989 15.401 -8.928 LABI FP=F SIGFP=SIGF PHIO=PHIB FOMO=FOM HLA=HLA HLB=HLB HLC=HLC HLD=HLD LABO PHIDM=PHIDM FOMDM=FOMDM END MY-DATA # # NON-CRYSTALLOGRAPHIC SYMMETRY AVERAGING #[ A three fold averaging calculation with ] #[ extreme phase extension e.g. 8.0 - 3.0Å ] #[ must be done more carefully, hence ] #[ NCYC and SCHEME cards ] # dm hklin chmimir.mtz hklout dmchm.mtz ncsin1 chmi.msk << MY-DATA SOLC 0.52 MODE SOLV HIST AVER COMBINE PERT SCHEME RES FROM 8.0 NCYC 1000 AVER REFI ROTA POLAR 0.0 0.0 0.0 TRANS 0.0 0.0 0.0 AVER REFI ROTA POLAR 113.28130 103.41944 120.33858 TRANS 43.635 38.059 62.726 AVER REFI ROTA POLAR 66.58067 -76.78019 119.69176 TRANS 82.989 15.401 -8.928 LABI FP=F SIGFP=SIGF PHIO=PHIB FOMO=FOM LABO PHIDM=PHIDM FOMDM=FOMDM END MY-DATA # # NON-CRYSTALLOGRAPHIC SYMMETRY AVERAGING WITH AUTOMASK #[ A 3-fold averaging calculation. No input averaging ] #[ mask is required, the the NCSMASK NMER keyword is ] #[ added. Mask is updated occasionally. ] # dm hklin chmimir.mtz hklout dmchm.mtz << MY-DATA SOLC 0.52 NCSMASK NMER 3 UPDATE 4 NCYC 10 MODE SOLV HIST AVER COMBINE PERT AVER REFI ROTA POLAR 0.0 0.0 0.0 TRANS 0.0 0.0 0.0 AVER REFI ROTA POLAR 113.28130 103.41944 120.33858 TRANS 43.635 38.059 62.726 AVER REFI ROTA POLAR 66.58067 -76.78019 119.69176 TRANS 82.989 15.401 -8.928 LABI FP=F SIGFP=SIGF PHIO=PHIB FOMO=FOM LABO PHIDM=PHIDM FOMDM=FOMDM END MY-DATA # # MULTI-DOMAIN AVERAGING #[ a two fold averaging calculation with ] #[ two domains and refinement of the 2nd ] #[ set of averaging matrices. ] #[ WARNING: IF YOU DONT KNOW WHAT MULTI- ] #[ DOMAIN AVERAGING IS, YOU DONT NEED IT ] # dm hklin hpattj.mtz hklout dm1.mtz ncsin1 cwnads.mask ncsin2 cwglobs.mask << EOF-dm SOLC 0.57 MODE SOLV HIST AVER NCYCLE 40 AVERAGE DOMAIN 1 OMAT 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 AVERAGE DOMAIN 1 OMAT -0.71389002 -0.69492584 0.08611962 -0.69635397 0.69129372 -0.19136506 0.07357326 -0.19652288 -0.97735721 115.37364197 54.98566055 67.00005341 AVERAGE DOMAIN 2 REFINE ROTA MATRIX 1.0 0.0 0.0 - 0.0 1.0 0.0 - 0.0 0.0 1.0 TRANS 0.0 0.0 0.0 AVERAGE DOMAIN 2 REFINE ROTA MATRIX 0.75830859 0.65183645 0.00883542 - 0.65189570 -0.75824565 -0.00975925 - 0.00033828 0.01316060 -0.99991322 TRANS 17.30371666 -47.10081482 68.99727631 LABIN FP=FP SIGFP=SIGFP PHIO=PHIml FOMO=FOMml HLA=HLA HLB=HLB HLC=HLC HLD=HLD LABOUT PHIDM=PHIDM FOMDM=FOMDM EOF-dm # # NOTE: If you don't know what multi-domain averaging is, # you don't need it. Use the ncs averaging example, not # the multi-domain example. #