PDBSET (CCP4: Supported Program)


pdbset - various useful manipulations on coordinate files


pdbset XYZIN foo_in.pdb XYZOUT foo_out.pdb
[Keyworded input]

Note that PDBSET should work with mmCIF files as well as PDB files.


The available keywords are:


In the description below, optional items are in [], alternatives are separated by |, keywords are in uppercase, parameters (i.e. numbers) are in lowercase. The input itself is case-insensitive for keywords (but parameters e.g. chain IDs must of course be the correct case). In the output file, the chain ID is always uppercase.

Divide residue ID into chain ID + residue number (if it begins with a non-digit) (for output from O). This is ALWAYS done, so the output file always has a valid numerical residue number.

CELL a b c [alpha beta gamma]

Read cell dimensions and make CRYST1 & SCALE header records. These will replace any CRYST1 & SCALE lines already present in file. The CRYST1 line should have the spacegroup in it, so a SPACEGROUP command is recommended. Note that if the TRANSFORM or SHIFT cards are present and the input PDB file contains CRYST1 and SCALE cards, the transformation operation will take place using the original cell dimensions. If the user wishes to perform the transformation operation using the new cell dimensions then two separate runs of the program are required.

ORTHOGONALIZATION (or NCODE) orthogonalization_code

Define code to generate orthogonalization matrix from input cell. This is not normally required, and only has an effect if a CELL command is also given.

     Code :-  
        = 1  axes along a, c* x a, c*  (Brookhaven standard, default)
        = 2  axes along b, a* x b, a*
        = 3  axes along c, b* x c, b*
        = 4  axes along a+b, c* x (a+b), c*
        = 5  axes along a*, c x a*, c       ( Rollett )
        = 6  axes along a, b*, a x b*
        = 7  axes along a*, b, a* x b   (TNT convention, 
                                         probably not very useful here
                                         since TNT has its own converter

SPACEGROUP spacegroup_name

Read spacegroup name (not essential, but put into CRYST1 line on output)

SYMGEN Spacegroup_name | Spacegroup_number | Symmetry_operation | NCS

Generate chains with these symmetry operations applied. If the operations are given explicitly, several SYMGEN commands may be given. The identity operation must be specified explicitly if required. Use the CHAIN command to rename them. Note that, except for NCS, these symmetry operations apply to fractional coordinates, so the orthogonalization operation must be known to the program, either from CRYST1 and/or CELL lines in the input coordinate file, or from a CELL command. If the keyword NCS is given, then a series of TRANSFORM commands should be given to define the non-crystallographic symmetry operations to be used.

NB: if supplying individual symmetry operations, these must be in the form found in the file symop.lib, e.g.

SYMGEN 1/2+X,1/2+Y,Z
Elements within each operation are separated by commas. To supply multiple operations on a single line, separate each pair of operations by an asterisk, e.g.
SYMGEN -X,Y,-Z * 1/2+X,1/2+Y,Z

RENUMBER [INCREMENT] start|increment [residue range] [CHAIN old_chain [TO new_chain]]

Renumber or add constant to residue numbers in given range. The residue range is given as 1st_residue_number [TO] last_residue_number. If the CHAIN keyword is present, the renumbering applies only to this chain. The option TO new_chain causes the chain identifier to be changed. Note that renumbering is done after chain renaming specified by the CHAIN command, so the chain specified here (old_chain) is the chain ID after any renaming. N.B. there is NO check that different RENUMBER commands are mutually exclusive. To avoid problems with recursive renumbering, if more than one RENUMBER command would apply to a residue, only the first will be done.
(Defaults all residues, all chains).

     e.g. RENUMBER 35                ! renumber all residues, starting from 35
          RENUMBER INCREMENT -5  102 TO 110 CHAIN C  ! subtract 5 from
                                     ! residues 102 to 110 in chain C
          RENUMBER 101 1 TO 78 CHAIN A TO B  
               ! renumber residues 1 to 78 in chain A from 101 (to 178),
               ! changing the chain identifier to B

CHAIN [SYMMETRY Nsym] [old_chain] new_chain

Change chain ID to given value. If only one value given, change all chains to this value. If SYMMETRY keyword given, this applies to this symmetry operation only. A series of CHAIN commands may be given.

    e.g. CHAIN Q                ! change all chains to Q
         CHAIN SYMMETRY 2 A B   ! change chain generated from chain A
                                !  by symmetry operation 2 to B

BFACTOR [subkey] B_reset (B_reset2)

Set B-factor (default 20.0).

ALWAYS (default)
Reset all B-factors to B_reset
Reset B-factor to B_reset only if B-factor= 0.0
Reset B-factor to B_reset only if B-factor is less than B_reset
Reset B-factor to B_reset only if B-factor is greater than B_reset
Truncate B-factors to the given range. If B-factor is less than B_reset, B-factor = B_reset; if B-factor is greater than B_reset2, B-factor = B_reset.
Average B-factors from the main chain (N CA C O atoms) and side chain of a residue and reset B-factor to B_average-mainchain or B_average-sidechain as appropriate.

OCCUPANCY [subkey] Occ_reset (Occ_reset2)

Set occupancy (default 1.0).

ALWAYS (default)
Reset all occupancies to Occ_reset
Reset ZERO occupancies to Occ_reset
Reset occupancy to Occ_reset if occupancy less than Occ_reset.
Reset occupancy to 0 if occupancy less than Occ_reset , and to 1.0 if occupancy greater than Occ_reset2.

SELECT [subkeys]


Select only specified chain(s).
e.g. SELECT CHAIN C ! select only chain C
OCCUPANCY [<minimum_occupancy>]
Select only atoms with occupancy .gt. minimum_occupancy [ default = 0.0]. This can be used to strip out dummy atoms with zero occupancy
BFACTOR [<maximum_B>]
Select only atoms with Bfactor less than <maximum_B> [default = 99.0]


Define rotational transformation, either as MATRIX (this keyword may be omitted) followed by 9 numbers (r11 r12 r13 r21 r22 r23 r31 r32 r33), by keyword EULER followed by Eulerian angles alpha, beta, gamma (as in ALMN), or by keyword POLAR followed by polar angles omega, phi, kappa (as in POLARRFN). This transformation will be applied to all atoms. The SHIFT command may be used to define a translation in addition. The transformation defined by ROTATE & SHIFT, or by TRANSFORM, is applied after any SYMGEN operation. Multiple definitions of ROTATE or TRANSFORM, or of SHIFT will NOT be concatenated: only the last will be effective.

The subkey INVERT causes the inverse transformation to be applied. Note that an INVERT instruction if present will apply to both ROTATE & SHIFT.


Define translation transformation (added AFTER rotation). If the keyword FRACTIONAL is present, the translation is assumed to be in fractional coordinates, otherwise orthogonal Angstroms. The subkey INVERT causes the inverse transformation to be applied. Note that an INVERT instruction if present will apply to both ROTATE & SHIFT.

TRANSFORM [INVERT] [FRACTIONAL] r11 r12 r13 r21 r22 r23 r31 r32 r33 tx ty tz
TRANSFORM [INVERT] ODB [O_database_filename]

Define transformation, equivalent to ROTATE MATRIX + SHIFT. If the keyword FRACTIONAL is present, the translation is assumed to be in fractional coordinates, otherwise orthogonal Angstroms. The subkey ODB causes the transformation to be read from a file in the format of an O datablock transformation. The subkey FILE reads the transformation from a formatted file containing a 3x3 matrix followed by a translation vector. The subkey INVERT causes the inverse transformation to be applied.

If a SYMGEN NCS command is given before TRANSFORM commands, these are collected together to generate multiple NCS-symmetry related chains.

REMARK anything

Just gets echoed to output coordinate file.

XPLOR [subkeys]

The input file is assumed to come from Xplor; the following operations are then done:-

  1. All hydrogens are removed, unless subkeyword HYDROGEN is present.
    N.B.: it is possible that not all sidechain hydrogens will be removed under this option. To avoid the problem, use the X-plor option select=(not hydrogen) at the end of whatever X-plor job you run (thanks Salam Al-Karadaghi).
  2. Dummy atoms (X .gt. 9000) are removed.
  3. The segment identifier (columns 73-76) is used as the CHAIN name for any chain renaming (etc) commands: thus in this case references to chains in other commands may have up to 4 characters and are case-sensitive. Unless renamed, the first character of the segment identifier is put in the chain ID and made uppercase.
  4. The residue number is read correctly for numbers .ge. 1000.

PICK atom1 atom2 . . .

Define atom names to be included: all other atoms will be omitted - e.g. PICK CA to choose C-alpha only. Note that the atomname is case-sensitive.

SEQUENCE [PDB|SINGLE] [sequence file name]

Write out sequence to a file (default file name SEQUENCE). This can be edited to give a sequence for Xplor or O, etc. If the keyword PDB is present, the sequence is written in PDB SEQRES format, split by chains. If SINGLE is given, the sequence is written in single-letter code.

This function also writes out the estimated molecular weight based on the sequence. Note that this may differ from the value obtained by summing the weights of all the atoms in the input PDB file.


Set output options. The default is to output a file (XYZOUT) in the same format as the input (XYZIN).

Output a PDB file.
Output an mmCIF file.
Duplicate the chain ID as an Xplor segid, to make the file suitable for direct input into Xplor.


Convert Us on input file to B (B = 8 pi**2 u**2).

ELEMENT <E1> <E2> . . .

Define list of 2-character element names to be left-justified in atomnames, e.g. MG, FE, ZN. Note that the element name is case-sensitive. The PDB convention defines the first 2 characters of the atomname as the element name, but Xplor & O put them in the wrong place. CA is NOT accepted, as this conflicts with Calpha: you will have to decide what to do with these yourself.

REORTHOGONALIZE [[FROM] <ncode_in>] [TO] <ncode_out>

Change orthogonalization convention for coordinates by converting to fractional in the input convention (FROM) and reorthogonalizing in the output convention (TO). If the FROM Ncode is omitted, the orthogonalization will be taken from the input (PDB) file as SCALEn lines, or the default of Ncode = 1 will be used. If the cell is not present in the input file, a CELL command must be given here. <ncode_out> is compulsory. See above for Ncodes.

REPLACE RESIDUE <old_residue_type> BY <new_residue_type>

Globally replace residue type, e.g. REPLACE RESIDUE CYS BY CYH. Useful for renaming according to dictionary conventions of different programs. The residue names will be right-justified before use to allow for single character names.
e.g. replace residue C by CYT.

REPLACE ATOM <atom_name> BY <new_atom_name> [IN <residue_type>]

Replace atom name by new one, optionally only in specified residue name. Note that replace tests are done in the order given, so an IN <residue_type> command must allow for previous REPLACE RESIDUE commands. Note also that leading spaces must be given in atom names e.g.


EXCLUDE [subkeys]

Exclude some things, depending on subkey:

Exclude all non protein and side chain atoms past CB i.e. create a POLYALA model. N.B. the residue names are NOT changed.
WATer or HOH
Exclude residues labelled WAT or HOH.
Exclude hydrogen atoms (as for the XPLOR option)
Exclude all lines except ATOM & HETATM lines. The default is to copy them from the input file.


Will calculate the centre of mass and maximum distance from it of the coordinates output. This may be useful for determining the rotation function integration radius (not done by default since it requires an intermediate file).

NOISE [maximum_shift] [subkeys]

Introduce random shifts into atom positions in orthogonal coordinates.
maximum_shiftmaximum shift (Angs)
defaults to 0.2 Angs, fails if greater than 0.5 Angs
CHAIN act on only specified chain(s)
eg   NOISE 0.1 CHAIN C   select only chain C
BFACTOR [<minimum_B>] act on only atoms with B-factor greater than <minimum_B>
PICK act on only specified atom names
eg   NOISE 0.1 PICK CA   to choose C-alpha only
Note that the atomname is case-sensitive


ATomRENUMBERing: discards the atom numbers from the input file and writes out new sequential atom numbers. This can be used to renumber atoms in PDB files where atom records have been removed without "correcting" the atom numbers.


Phil Evans, MRC LMB, Cambridge, September 1992


########################  Convert PDB file to mmCIF format
#!/bin/csh -f
pdbset xyzin toxd.pdb xyzout toxd.cif << eof-1
output cif

########################  Take output from O into a form suitable for refinement
#!/bin/csh -f
pdbset xyzin bst_113m.pdb xyzout temp1.pdb << eof-1
cell    132.02  115.21   96.20   90.00   90.00   90.00
spacegroup P212121

###################  Take output from Xplor into a form suitable for refinement
#!/bin/csh -f
pdbset xyzin bst_113m.pdb xyzout temp1.pdb << eof-1
cell    132.02  115.21   96.20   90.00   90.00   90.00
spacegroup P212121

######################## Expand dimer to tetramer, rename chains, transform
#!/bin/csh -f
#  Make tetramer from dimer
pdbset xyzin ecrproducts268.pdb xyzout ecrprodpqrtet.pdb <<eof-1
remark  Tetramer generated from AB dimer
remark   rotated to pqr frame
! Generate other dimer by z-dyad in P21212
symgen  x,y,z
symgen -x,-y,z
! Rename chains in second dimer: V & W are water chains
chain symmetry 2   A C
chain symmetry 2   B D
chain symmetry 2   V X
chain symmetry 2   W Y
! transform to molecular frame
transform -
  0.87831   0.47808   0  -
    0         0     -1.  -
 -0.47808   0.87831   0  -
 0.0  -2.713  0.0