SFCHECK (CCP4: Supported Program)


sfcheck - A program for assessing the agreement between the atomic model and X-ray data.


sfcheck [HKLIN in.mtz] [XYZIN in.pdb] [HKLOUT out.mtz] [MAPOUT map.ccp4]
[PATH_OUT path_out] [PATH_SCR path_scr]
[Keyworded input]

Version 7.0.4 (05.12.2004) - Features

A program for assessing the agreement between the atomic model and X-ray data or EM map( new ). The program requires one or two input files, with the coordinates of the model (in PDB or CIF or BLANC format) and structure factors (in CIF or PDB or BLANC or MTZ format) or EM (ccp4 format), and runs completely automatically, gives information about R-factor, correlation, Luzzati plot, Wilson plot, Boverall, pseudo-translation, twinning test ..., local error estimation by residues. Sfcheck can compute omit phases and use these instead of phases of model. For output Sfcheck generates a PostScript file. Sfcheck can also create output CIF file with omit phases or with detwinned data. When input file is EM map program can create new mirror or/and scaled map.



   Authors:      A.A.Vagin, J.Richelle, S.J.Wodak. 
                email: alexei@ysbl.york.ac.uk
    A.A.Vaguine, J.Richelle, S.J.Wodak. SFCHECK: a unified set of 
    procedure for evaluating the quality of macromolecular structure-factor
    data and their agreement with atomic model.
    Acta Cryst.(1999). D55, 191-205


     Copy file sfcheck.tar.gz

          and uncompress it (`gunzip sfcheck.tar.gz')  
    After untaring `sfcheck.tar' , you will get a sfcheck directory,
    with src, doc  and bin subdirectory. To build the executable,
    go in src and then you have to option

After untaring `sfcheck.tar' (command: tar xvf sfcheck.tar) you will get a sfcheck directory, with src, doc and bin subdirectory. To build the executable, go to src and then you will have following options:

the executable (sfcheck) will finish up in the bin directory; providing the full pathname (.../sfcheck/bin/sfcheck) one can execute it from anywhere without having to define an environmental variable. CCP4 version (which can read MTZ file ) will be prepared if ccp4 is installed
sfcheck.setup linux
Linux and Mac distribution
Or if you like:
1. set MR_LIBRARY = '/ccp4-5.2/lib/libccp4f.a /ccp4-5.2/lib/libccp4c.a'
define libraries (without ccp4: MR_LIBRARY = 'sfch_dummy.o')
2. set MR_FORT = ( f77 -O2 )
define compiler with options
sfcheck for Fortran 90 with memory allocation:
in main_sfcheck_ccp4.f:
comment line:
uncomment lines:
C VERS = 'M'
for ccp4 version, CCP4 must be prepared with the same compiler

Also you can download binaries (executable files):

( all with memory allocation option)




New style to use

You can use this version as previous one:

1. by command (batch) file
2. interactively
3. by ccp4i

New style to use:

 You can use program by command string with options (without any keywords):

  sfcheck -f file_sf_mtz_or_cif_or_map  -m model_pdb_or_cif
          -out out    -nomit Nomit
          -mem Nm     -na Na
          -scl map_scale_factor  -map  -invert
          -h             -r
          -po path_out -ps path_scrath
          -lf label_F  -lsf label_sigF
          -li label_I  -lsi label_sigI
          -lfree label_free_flag

     h      = help and information about mtz labels
     r      = rest some special files(.dst,...)
     out    = y - see nomit option
              a - program creates CIFile (sfcheck.hkl)
                  with anisothermal corrected Fobs
              u - CIFile with detwinned data
     map    = extract density map will be created (sfcheck_ext.map)
              or new map if input was map (sfcheck.map).
              Useful to prepare mirror or/and scaled map
     invert = mirror map will be used 
     nomit  = number of cycles of omit procedure.
              2 is a good choice.It takes time
              if OUT = Y, program creates CIFile (sfcheck.hkl)
              with omit phases
     Nm     = memory request in Mb (for f90 only)
     Na     = maximal number of atoms in the model
     label_* = labels for mtz_file

     For example:

        sfcheck -f file.mtz


        sfcheck -m file.pdb


        sfcheck -f file.mtz  -m file.pdb


        sfcheck -f file.mtz  -m file.pdb -nomit 2 -map  -out y

        sfcheck -f file.mtz -lf FP -lsf SIGFP


        sfcheck -f file.mtz -h

Output information produced by SFCHECK

        cell parameters and space group
        number of atoms
        number of water molecules
        solvent content  
        <B> for model
        Matthews coefficient and corresponding solvent %
        reported resolution 
        reported R-factor
        refinement program
        resolution range for refinement
        reported sigma cut-off for refinement
        reported R-factor
        reported Rfree
    4.Structure factors:
        number of reflections
        number of reflections with I > sigma
        number of reflections with I > 3sigma
        resolution range
        R-standard (sum(sigma)/sum(F))
        Wilson plot (amplitudes vs. resolution)
        overall B-factor by Patterson origin peak and by Wilson plot
        optical resolution 
        expected minimal error in coordinates
        Anisotropic distribution of Structure Factors -ratio of Eigenvalues 
    5.Model vs. structure factors:
        Correlation coefficient
        R-factor for reported resolution range and sigma cut-off
        Luzzati plot (R-factor vs. resolution)
        coordinate error from Luzzati plot
        expected maximal error in coordinates
        Patterson scaling   - scale, Badd
        Anisothermal scaling - betas: b11,b22,b33,b12,b13,b23
        Solvent correction - Ks,Bs
    Optical resolution
      Optical resolution is defined as an expected minimum distance
      between two resolved peaks in the electron density map.

      With a single-Gaussian approximation of the shape of atomic peak
      the minimum distance between two resolved peaks is twice the standard 
      deviation "sigma" or the width of atomic peak W (W = 2 sigma).
      Expected width of atomic peak W is computed as

       W = sqrt ( 2 (sigma_patt2 + sigma_res2) )

       where  sigma_patt - standard deviation of the Gaussian corresponded 
                         to the Patterson origin peak. 

            sigma_res  - standard deviation of the Gaussian corresponded
                         to the origin peak of spherical interference function
                         which is Fourier transform of the sphere in 
                         the reciprocal space with radius 1/d_min.

                         sigma_res = 0.356 d_min.         
                         d_min is minimum d-spacing, "nominal resolution".

      The "expected optical resolution for complete data set" is  
      calculated as above but using all reflections, with values for
      missing reflection being the average value in the corresponding
      resolution shell.
      Plot of Optical resolution for an atom with B=0 demonstrates
      behaviour of the part of Optical resolution corresponded on the 
      series termination.

      (for the proof see Appendix)
    Patterson scaling
      Scaling in SFCHECK is based on the Patterson origin peak which is
      approximated as a gaussian. Compared to the conventional scaling 
      by the Wilson plot, this method is particularly advantageous when
      only low resolution data are available.
      The program gives overall B-factors estimated by both methods.
    Low resolution cut-off
      Disordered solvent contributes to diffraction at low resolution.
      However, removing of low resolution data from calculations results
      in a series termination effect which is noticeable in the electron
      density at the surface of the molecule. To reduce the influence of
      low resolution terms, SFCHECK applies the "soft" low resolution 
      cut-off to structure factors according to the formula:
        Fnew = Fold (1-exp(-Boff*s2)) , where Boff = 2dmax2
      Program uses Boff = 256
      Program scales Fobs and Fcalc by the Patterson origin peak using all
      data applying Boff.
      First, computes Boveralls for observed and calculated amplitudes.
      Second, makes the width of the calculated peak equal to the 
      observed, i.e. computes an additional thermal factor Badd:
            Badd = Boverall_obs - Boverall_calc
      Third, computes the scale factor for Fcalc:
            scale = sqrt ( --------------------------------------------- )
      Finally we have:
            Fcalc_scaled = Fcalc * scale * exp(-Badd*s2)   
      The program computes R-factor and Correlation coefficient for all
      data applying the soft low resolution cut-off as described above. 
      The program also computes R-factor and Correlation coefficient for
      the reported resolution range and reported sigma cut-off without
      applying Boff. If the Fobs file contains reflections marked with
      the Rfree flag, the program computes Rfree.
      Missing data are restored by using the average values of 
      intensities for the corresponding resolution shell.
      The program produces a plot of completeness vs. resolution and
      a plot of the average radial completeness in polar coordinates
      theta and phi.
    Expected minimal error  
      The minimal coordinate error is estimated using experimental 
      sigmas(F). The standard deviation of atomic coordinates is 
         sig_min(r) = sqrt(3)*sigma(slope)/curvature
              where  sigma(slope) is a slope of electron density in the 
                                  x direction ( along A).             
                     curvature is an average curvature of the electron 
                                  density in the atomic peak center.
      and computed as:
       sigma(slope) = (2pi*sqrt(sum(h2*(sigF)2)))/(VOL*A)
                     VOL - volume of cell
                     A   - cell parameter
                     h   - Miller index        
                     summation over all reflections
                    ( Cruickshank,D.W.J. (1949) Acta.Cryst 2, 65.) 
       curvature  = (2pi2*sum(h2*F))/(VOL*A2)
                    ( Murshudov et al., (1997) Acta.Cryst D532, 240.) 
      If there is no experimental sigma for observed data, the program
      uses  sigma = Fobs * 0.04 for all reflections.
    Expected maximal error
      Expected maximal error in coordinates is estimated  by the difference
      between !Fobs! and !Fcalc!:
       sig_max(r) = sqrt(3)*sigma(slope)/curvature
       sigma(slope) = (2pi*sqrt(sum(h2*(Fobs-Fcalc)2)))/(VOL*A)
       curvature  = (2pi2*sum(h2*F))/(VOL*A2)
      For missing reflections the program uses the average value of 
      sigma(Fobs) for the corresponding resolution shell instead 
      of (Fobs-Fcalc).
    DPI - diffraction-data precision indicator
      The Cruickshank's method of estimation of coordinate error.
                   ( the Refinement of Macromolecular structure Proceeding
                     of CCP4 Study weekend. pp11-22 1996)
        sig(x) = sqr(Natoms/(Nobs-4Natoms)) C-1/3 dmin Rfact
                where  C     - fractional completeness.
                       Rfact - conventional crystallographic R-factor
                       Nobs  - number of reflections 
                       Dmin  - maximal resolution
       If Rfree flags are specified, the program uses the Murshudov's approach 
       to calculate DPI: 
                   (Newsletter on protein crystallography., Daresbury
                    Laboratory, (1997) 33, pp 25-30.)
        sig(x) = sqr(Natoms/Nobs) C-1/3 dmin Rfree
    Luzzati plot (R-factor vs. resolution)
       Program computes the average radial error <delta> in coordinates 
       by Luzzati plot.
                          <delta(r)> = 1.6 sig(x)
    Solvent content  
       Solvent content is the fraction of the unit cell volume not occupied
       by the model. The model consists of ALL atoms present in the coordinate 
    Residual factor Rmerge 
                            sum_i (sum_j |Ij - <I>|)
                Rmerge(I) = --------------------------
                                 sum_i (sum_j (<I>))
                Ij  = the intensity of the jth observation of reflection i
                <I> = the mean of the intensities of all observations of
                       reflection i
                sum_i is taken over all reflections
                sum_j is taken over all observations of each reflection

Local error estimation

    Local error estimation (plotted for each residue, for the backbone
    and for the side chain):
       1. Amplitude of displacement of atoms from electron density
       2. Density correlation coefficient
       3. Density index 
       4. B-factor
       5. Index of connectivity
      Displacement of atoms from electron density is estimated from the
      difference (Fobs - Fcal) map. The displacement vector is the ratio of
      the gradient of difference density to the curvature. The amplitude of
      the displacement vector is an indicator of the positional error.
    Correlation coefficient
      The density correlation coefficient is calculated for each residue
      from atomic densities of (2Fobs-Fcalc) map - "Robs" and the model
      map (Fcalc) - "Rcalc" :
      D_corr =  <Robs><Rcalc>/sqrt(<Robs2><Rcalc2>)
          where <Robs> is the mean of "obsereved" densities of atoms of residue
                (backbone or side chain).
                <Rcalc> is the mean of "calculateded" densities of atoms of 
          Value of density for some atom from map R(x) is:
                   sum_i ( R(xi) * Ratom(xi - xa) )
          Dens =  ---------------------------------- 
                       sum_i ( Ratom(xi - xa) ) 
            where  Ratom(x) is atomic electron density for x-th point of grid.
                   xa - vector of the centre of atom.
                   xi - vector of the i-th point of grid.
                   Sum is taken over all grid points which have distance
                   from the centre of atom less than Radius_limit.
                   For all atoms Radius_limit = 2.5 A.
    Index of density and index of connectivity
      The index of connectivity is the product of the (2Fobs-Fcal) electron 
      density values for the backbone atoms N, CA and C, i.e. the geometric
      mean value for these atoms. Low values of this index indicate breaks 
      in the backbone electron density which may be due to flexibility of 
      the chain or incorrect tracing.  The index of density is a similar 
      indicator which is calculated for all atoms of a given residue.

Omit procedure

      An omit map is a way to reduce the model bias in the electron
      density calculated with model phases. SFCHECK produces the so
      called total omit map by an automatic procedure. First, the
      initial (Fobs, PHImodel) map is divided into N boxes. For each
      box, the electron density in it is set to zero and new phases are
      calculated from this modified map. A new map is calculated using
      these phases and Fobs. This map contains the omit map for the
      given box which is stored until the procedure is repeated for
      all boxes. At the end, all the boxes with omit maps compose
      the total omit map. Phases calculated from the total omit map
      are combined with the initial phases. The whole procedure may
      be repeated (keyword NOMIT). Note: it is time consuming!
      Program can create output file with omit phases (see keyword OUT)

Partional information

      Program can use only one input file of coordinates or structure 
      factors. In this case program gives information derived from
      input file without local estimation.

Twinning test

     Program checks for merohedral twinning.

     Perfect twinning test: <I2>/<I>2 

     Also (if it's possible)
     Program will compute Partial Twinning test:

          H = !I(h1)-I(h2)!/(I(h1)+I(h2))
     Alpha (twinning fraction) = 1/2 - <H>

     If  0.05 <Alpha< 0.45 program can create output file
     with detwinned data (see keyword OUT)


It is easy to use SFCHECK interactively, but can be used in batch. The best and easiest way to prepare a command file is to run SFCHECK once by dialogue. If a sfcheck.log file was assigned (first request), the program creates a command (batch) file (sfcheck.bat) automatically.

See some command (batch) file examples.

All keywords must be preceded by an underscore (e.g. _DOC). The available keywords are:

First keyword always must be defined:


One or both of these keywords must be defined:


Other keywords


To get started with SFCHECK interactively, you first have to answer this question:

  1. Do you want to have FILE-DOCUMENT sfcheck.log? < N | Y >


    Default: <N>

    do not produce DOC-file
    produce DOC-file

    The DOC-file contains the protocol of the running of the program. With the DOC-file, the program creates a command (batch) file: sfcheck.bat.

    Also you can use this keyword DOC to redirect output files:


    to special directory ( _DOC Y>path or _DOC >path). Examples:

      _DOC  Y>/y/people/alexei/
      _DOC   >/y/people/alexei/

FILE_C <file>

Default: < >

the name of the input file with the model coordinates (allowed formats: PDB, CIF or BLANC),

FILE_F <file>

Default: < >

the name of the input file with Fobs (allowed formats: ASCII (CIF or PDB) or BLANC or MTZ) or EM map (ccp4 format).

When using an MTZ file, MTZ keywords must be used (or program will use default values).

Other keywords

NOMIT <nmon>

Default: <0>

<nomit> is the number of cycles of omit procedure. 2 is good choice.

OUT <N | Y | U | A>

Default: <N>

create new CIFile (sfcheck.hkl) with anisothermal corrected Fobs
create new CIFile with Fobs and omit phases
new CIFile with detwinned data

MAP <N | Y>

Default: <N>

extract density (around model) map will be created (sfcheck_ext.map) or new map if input was map (sfcheck.map)

PATH_SCR <path>

Default: < >

path to scratch file directory

SCL <scale>

Default: <1 >

map scale factor


Default: <N>

Y - mirror map will be used

TEST <N | Y>

Default: <N>

not delete some special files

Output files

       The output information is represented in the PostScript file:
           sfcheck_map.ps (if input was map)
       A simple ASCII version of this file is in:
       Also the program can create:
           a new formatted CIFile of Fobs: sfcheck.hkl (keyword OUT)
           a file of density around model:   sfcheck_ext.map (keyword MAP) 
           /CCP4 format for CCP4 distribution or BLANC format/
           a  new map if input was map:    sfcheck.map
       Some other files will not be deleted if keyword TEST = Y.
       These files have internal format of the BLANC program 
       suite (see file README by ftp from anonymous @ftp.ysbl.york.ac.uk) 
       and can be used by programs of this suite.
           sfcheck_fob.dat     - BLANC_Fobserved_file 
           sfcheck_ph.dat      - BLANC_phases of the model
           sfcheck_omit_ph.dat - BLANC_omit_phases
           sfcheck_detwin.dat  - BLANC_detwinned_Fobs

How to redirect output and scratch files

You can use keyword PATH_SCR to redirect all scratch files to special directory.Example:

  _PATH_SCR  /y/people/alexei/

You can use keyword DOC to redirect output files:


 and also (if keyword TEST = Y )


to special directory.Examples:

_DOC  Y>path
_DOC   >path

SFCHECK version for CCP4

     You can have CCP4 version of SFCHECK which can read MTZ file
     or EM map (format CCP4) and create file with extract density 
     around model or new mirror or/and scaled map (format CCP4). 
  1. This possibility uses CCP4 libraries. 
     You must make setup CCP4 before.

  2. Keywords for reading MTZ file.

           Next keywords are necessary only for MTZ file
        F               - label of F or F(+)')
        SIGF            - label of sigma F or sigma F(+)')
        F-              - label of F(+)')
        SIGF-           - label of sigma F(-)')
        FREE            - label of Free_flag')
        I               - label of I or I(+)')
        SIGI            - label of sigma I or  sigma I(+)')
        I-              - label of I(-)')
        SIGI-           - label of sigma I(-)')

Command (batch) file examples

# --------------------------------
sfcheck <<stop
# --------------------------------
_FILE_C  model.pdb
_FILE_F fobs.cif

BATCH example for omit procedure :

  In this case all output files will be in directory: /y/people/alexei/
  and all scratch files will be created in directory: /y/people/alexei/work/

# --------------------------------
sfcheck <<stop
# --------------------------------
_DOC  >/y/people/alexei/ 
_FILE_C  model.pdb
_FILE_F fobs.cif
_OUT   Y
_path_scr /y/people/alexei/work/

BATCH example with MTZ file:

  In this case coordinate file isn't used.
# --------------------------------
sfcheck <<stop
# --------------------------------
_FILE_F p1.mtz
_F     FO

File formats

     1. Input PDB_file of coordinates

        Input PDB_file of coordinates must contain the CRYST1 card with
        the unit cell and the space group name.
        Program can use the information from HEADER,SCALE,MTRIX,REMARK cards.

     2. Input formatted file of structure factors

        This file of structure factors must be in PDB-format or CIFile
        which contains indices and structure factors or intensities.
        (also simple formatted file with "h,k,l,!F!,sig(F)" or "h,k,l,!F!" 
        and without titles is acceptable)
        The best is CIFile.

        A. Example of a CIfile of structure factor amplitudes:

          _entry.id  9ins  
          _struct.title ' insuline 9ins' 
          _cell.length_a      100.000
          _cell_length_b      100.000
          _cell.length_c      100.000
          _cell.angle_alpha    90.000
          _cell.angle_beta     90.000
          _cell.angle_gamma    90.000
          _symmetry.space_group_name_H-M  'P 1 21 1'
             2  3   4    12.3   1.2
            -2 -3  -4    11.4   1.1
           . . . . . . . . . . . . .
                or just:

             2  3   4    12.3   1.2
            -2 -3  -4    11.4   1.1
            . . . . . . . . . . . . .

           For intensities use:


        B. Example of a PDB file of structure factor amplitudes:

        HEADER   R2SARSF   15-JAN-91
        CRYST1  64.900   78.320   38.790  90.00  90.00  90.00 P 21 21 21    8
        FORMAT   (2(I3,2I4,2F7.0,F6.0,9X))
        COORDS   2SAR
        REMARK  2 DMIN=1.85, DMAX=16.28
        CHKSUM  1 MIN H=0,MAX H=34,MIN K=0,MAX K=41,MIN L=0,MAX L=20
        CHKSUM  4 SUM OF FOBS=0.235499E+07
          0   0   3     60      9    16           0   0   4    106    307    25
          0   0   5    166     23    20           0   0   6    239    657    52
          0   0   7    326      0    38           0   0   8    425    511    40
        . . . . . . . . . . . . . . . . . . . . . .

        C. Example of a simple formatted file of structure factor amplitudes 
           which is assumed to contain H,K,L,F,sig(F):
             2  3   4    12.3   1.2
            -2 -3  -4    11.4   1.1
            . . . . . . . . . . . . .

            or without sig(F):

             2  3   4    12.3  
            -2 -3  -4    11.4  
            . . . . . . . . . 

       The length of file records must not exceed 80 characters.
       The format of the records is free, e.g. data must be separated by
       blanks. (be careful - some PDB files do not satisfy this rule)  
       The program uses the information about cell parameters and space
       group from the coordinate file and ignores such information in
       the structure factor file.

Memory control

      Memory control parameters ( in main_sfcheck_ccp4.f ):
          MEMORY - memory for densities, gradients, coordinates, ...
          PARAMETER ( MEMORY=5000000)
          REAL      POOL(MEMORY)
          NCRDMAX - maximal number of coordinates
          PARAMETER ( NCRDMAX=200000)
          IPRSYM - maximal number of symmetry operators
          PARAMETER ( IPRSYM=96    )
          INTEGER*2 ISYM(5,3,IPRSYM)
          ISYM    - integer*2 array for cryst.symmetry operators 
          IPRSYM  - dimension of integer*2 array ISYM(5,3,IPRSYM)
                    maximal number of cryst.symmetry operators.
          MEMORY  - dimension array POOL. 
                 MEMORY = MAPMAX + (NCRDMAX/2)*5 , where MAPMAX - maximal
                                                   size of XY-section (NX*NY)


  Estimation of the width of atomic peak by the Patterson origin peak.

  Fourier transform of atomic Gaussian:

         ---------------      exp( -r2/(2 sigma_four2) )
         (2pi sigma_four)2/3
             where sigma_four is standard deviation of Gaussian.

 is also Gaussian:

                 B s2
          exp( - ----- )     where B = 8pi2 sigma_four2  

Patterson function which calculated as Fourier transform of reciprocal 
space Gaussian in square:

                 2 B s2
          exp( - ------- )     

 is also Gaussian with standard deviation (for infinite fourier series) 

          sigma_patt_02 = ----  = 2 sigma_four2

 Effect of series termination of Fourier transform can be considered
 as the product in the reciprocal space infinite number of 
 Fourier coefficients and the sphere with radius 1/d_min, where
 d_min is minimum d-spacing. The product in the reciprocal space 
 corresponds to the convolution in the Patterson space.
 Fourier image of sphere is the spherical interference 
 function T(r) (Int.Tables,1993,vol B,p247):
           3 ( sin(x) - x cos(x) )
   T(r) = -------------------------  where x = 2pi r (1/d_min)
 Using Taylor's expansion the origin peak of function T(r)
 can be approximated by Gaussian:

          exp( - ------------- )    
                 2 sigma_res2

                    where sigma_res is standard deviation of Gaussian.   

                          sigma_res = ( d_min *sqrt(5) )/ 2pi = 0.356 * d_min  
 This result is identical to the optical definition of resolution
 (Blundell,1976), (James,1948)
 as twice the distance from maximum to the first zero of image 
 of a point source. In 3-dimensional case the coordinate of the first
 zero is 0.715 d_min ~ 2 sigma_res.

 Standard deviation 'sigma' of Gaussian which is product of two Gaussians with
 standard deviations sigma_1 and sigma_2 is

    sigma2 = sigma_12 + sigma_22

Therefore the standard deviation of Patterson origin peak with finite
Fourier series is

    sigma_patt2 = sigma_patt_02 + sigma_res2

Standard deviation of expected atomic peak for finite Fourier series is

    sigma_four2 = sigma_patt_02/2 + sigma_res2 =

                 = sigma_patt2/2  + sigma_res2/2
 Finally, expected width of atomic peak is:

  W = 2 sigma_four = sqrt ( 2 ( sigma_patt2 + sigma_res2) )