Study Weekend Program... UP


New CCP4 Coordinate Library and applications based on it.

Eugene Krissinel
European Bioinformatics Institute, Genome Campus, Hinxton,
Cambridge CB10 1SD, United Kingdom

The new CCP4 Coordinate Library is a development aiming to provide a common layer of coordinate-related functionality to the existing applications in CCP4 suit, as well as a variety of new tools that simplify design of new applications in part related to atomic coordinates. The Library comprises a wide spectra of useful functions, ranging from parsing coordinate formats and elementary editing operations on the coordinate hierarchy of biomolecules, to high-level functionality such as calculation of secondary structure, interatomic bonds, atomic contacts, symmetry transformations, structure matching and many others. Most of the functions are available in C++ object interface, however a FORTRAN interface is also provided for the compatibility with older CCP4 applications.

With use of the Library, design of many applications, similar to the existing in CCP4 suit, reduces largely to a wrapping over the Library functions. This is shown by several demos and full-scale applications, two of which (new Contact and a remote analogue of PDBSet), together with the Library, are included into CCP4 release 5.0.

The Library provides a basis also for SSM (Secondary Structure Matching), a new tool for protein structure comparison and recognition in 3D, developed jointly by EBI-MSD and CCP4. The tool is available at http://www.ebi.ac.uk/msd-srv/ssm . The structure comparison is based on the matching of representative SSE graphs, followed by 3D alignment of the protein backbone Cα atoms. The alignment algorithm maximises an empirical structural similarity function Q, which controls the balance between alignment length Na and RMSD achieved. Comparison of SSM with similar resources available in the Web (DALI, VAST, CE) shows a good agreement to the degree of difference between all of them. While the difference in Na and RMSD often increases significantly with decreasing structural similarity, the quality function Q shows a remarkable agreement. These results imply that simple measures like alignment length and RMSD do not give a sufficiently good indication of structural similarity. SSM was developed to run on UNIX platforms, either as a CGI or standalone application, optionally using many-CPU clusters for parallel processing. A typical query (alignment of a 200-300 residue protein to the whole PDB or SCOP archives) normally takes less than a minute. This performance was found to be sufficient for serving structural queries in real time, therefore, SSM, in difference of other similar systems, does not maintain a database of pre-aligned structures, does not employ reduced sets of representative structures and does not pre-screen the database for sequence similarity.

SSM is available for download and in-house installation under both academic (free) and commercial licenses.