NEWS FROM THE UPPSALA SOFTWARE FACTORY - 9
Déjà-vu all over again
Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden
While building a protein model into electron density, one often comes
of the model that make one wonder: "(where) have I seen this before?
". At the level of the overall fold, there is plenty of software
that can help answer this question (DALI, DEJAVU, TOP, etc.
). But when it comes to recognising smaller "motifs" (e.g.
, a set of residues involved in binding a ligand or metal ion, or with
side chain-side chain interactions), answering the question "has
this been observed in any other protein structure?
" is not as simple.
At the 1995 CCP4 meeting, Peter Artymiuk described a program called ASSAM
that could recognise spatial arrangements of side chains by comparing
them to a database
of protein structures. This provided the inspiration for the SPASM
package   
that contains programs for the recognition of arbitrary patterns or
motifs in protein
structures, interfaced with O 
and other programs.
SPASM is a program that can be used to recognise user-defined motifs in a
of protein structures (derived from the PDB). The user merely has to
carve out those
residues that (s)he is interested in (e.g., catalytic residues, a strange
loop, ligand-binding residues, a weird Met-Trp interaction, a
helix-turn-helix motif, etc. etc.;
whatever is selected will be referred to as a "motif" from now on) and
into a small PDB file. The program will read this file as well as its
will prompt for values for a few parameters (the default values will do
in most cases), and
will subsequently find all instances of the motif in the proteins that
are in the
database. (The nitty-gritty and some of the bells and whistles are
Besides simply listing the "hits", SPASM can also generate a macro file
for use with
O which, when executed, will automatically read the hits, apply the
operator that superimposes the hits with the user's motif, and draw the
hits. Thus, within five to ten minutes one obtains a visual answer to
the original question:
"(where) has this motif been observed previously?
If you find hits that display similarity to your own protein that extend
matched motif (e.g., similar fold or domain), global superpositioning of
and your own model can be carried out by LSQMAN. An input file for
LSQMAN that does
this can be generated by SPASM as well, making this a very rapid process.
interface exists to the SBIN package of programs  
, that can be used to analyse superimposed structures to find
similarities in their
sequences. These, in turn, can be used to attempt "database mining" in
databases such as SWISS-PROT 
, in the hope of identifying other proteins that might have the same
fold, or share
a common domain.
RIGOR is another program in the SPASM package that does in essence the
SPASM. Where SPASM compares a user-defined motif to a database of
RIGOR looks for instances of a large number of predefined motifs in the
Of course, the utility of this approach depends critically on the
quality of the
database. At present, it contains a few hand-crafted motifs, but the
majority has been generated automatically. These automatically generated
were extracted from proteins in the SPASM database, and consist mostly of
sets of residues whose
side chains cluster in space, or are all in close proximity to a
Just like SPASM, RIGOR is interfaced to O allowing for rapid
visualisation of the
results. Users are welcome to submit additional motifs for inclusion in
of the RIGOR database. Eventually, I hope to develop software that takes
intelligent approach to detecting motifs that recur in several or many
Obviously, the SPASM package can be tremendously useful in the analysis
of newly determined
protein structures. The programs help crystallographers to make the most
models, prior to publication and deposition. After all, nobody likes to
see papers in which professional database scrutinisers (for want of a
better word) announce
that they have found an unexpected similarity between one's own protein
determination of which may have taken you years) and some other protein
been in the database for years.
In addition, SPASM can be used in comparative structural analysis, where
typically be interested in finding all proteins that contain a certain
of helices, strands, turns, and loops, or in all proteins that contain a
constellation of residues or side chains. Other potential applications
lie in the areas of
protein design and engineering, and prediction of structure and
The SPASM package contains the programs SPASM and RIGOR, as well as two
generate private databases for use with these programs (e.g.
, with in-house structures that have not yet been released by the PDB).
friends (including databases and manuals) are available free of charge to
. Commercial users may contact GJK for more information
). For more information about O
, contact Alwyn Jones (
). The O WWW site is at
, and the Uppsala Software Factory can be found at
Artymiuk, P.J., Poirrette, A.R., Grindley, H.M., Rice, D.W. and Willett,
A graph-theoretic approach to the identification of three-dimensional
amino acid side-chains in protein structures. J. Mol. Biol.
Artymiuk, P.J., Poirrette, A.R., Rice, D.W. and Willett, P. (1995).
protein folds and sidechain clusters using algorithms from graph theory.
"From First Map to Final Model" (Bailey, S., Hubbard, R. and Waller,
pp. 71-81, SERC Daresbury Laboratory, Daresbury, U.K.
Kleywegt, G.J. and Jones, T.A. (1998). Databases in protein
crystallography. Acta Cryst.
, in press. (A preprint of this paper is available at URL:
Kleywegt, G.J. (1998). Recognition of spatial motifs in protein
The manuals for the SPASM programs are available at URL:
Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved
for building protein models in electron density maps and the location of
these models. Acta Crystallogr.
The manuals for the SBIN programs are available at URL:
Bairoch, A. and Apweiler, R. (1997). The SWISS-PROT protein sequence
data bank and
its supplement TrEMBL. Nucl. Acids Res.
Newsletter contents ...