Thoughts on use of XML
For convenience, I am going to call the putative CCP4 XML "XtalML".
Though this may consist of several independent DTDs/schemas.
Where could we use XML?
A new binary format is being implemented for coordinate data,
with the ability to dump to PDB and mmCIF. The content of the
binary format will be based on the mmCIF dictionary. Perhaps
we can also dump to an XML file. This will be trivial if the
DTD is based on the mmCIF dictionary, but will require complicated
MTZ and map files
I think there is no question of replacing MTZ and map files with
an XML format. It might be feasible to replace just the
header sections with an XML-style section.
However, since users do not normally view MTZ and map files directly,
there is little point in this. Rather, they use the mtzlib routine
LHPRT to print a text layout of the header (into the log file
or via mtzdump). This routine could be adapted to produce header
information in XML.
Automation relies on being able to track the progress of structure
solution, and hence on the computer being able to read the log file.
Therefore, there would be big advantages in replacing or supplementing
the ad-hoc text log file with an XML document.
There are some parallels here with data harvesting. However, data
harvesting is designed to be read only by the deposition centre and
not by the user's software. Any changes would need to be agreed by EBI.
This is currently in HTML. Upgrading to XML implies looking at the
documentation as data rather than a simple document, and would
involve substantial re-writing. Not sure that there's a good reason
to do this.
Alun suggests the ccp4i database (file CCP4_DATABASE/database.def and
related .def files) be converted to XML. Also, that program input be
in XML. The latter only makes sense if it is machine-written, e.g.
XML vs. mmCIF
There are obvious similarities between XML and mmCIF. Both aim
at self-describing data. The XML DTD plays the role of the mmCIF
dictionary. What are the differences?
Advantages of XML
- Can be written/read by generic tools.
Advantages of mmCIF
XML schemas address some of these differences.
- Already established in crystallography.
- Includes data typing.
- Dictionary includes semantics (_item_description and _item_examples).
The DTD for XtalML
We can't do anything without having a DTD or schema. This should be well thought
out, as it is likely to get locked in. Much of it could be based
on existing mmCIF dictionary - is there an easy way to convert one
to the other? (I have a note which says that Peter Murray-Rust's JUMBO
can do this.)
But we are likely to want to add extra tags, e.g. specific
to log file processing. We should also check existing DTDs for relevance.
See list at:
under "science". Includes CML, MathML and a few biological ones.
Implementing data files
There is the question of whether we embed XML in other files, or
whether we use XML-only files. While it would be nice to have
true XML files, putting up such a barrier may in practice stop
us doing anything! Or at least we need an "everything else" tag,
such as the <pre> tag used in HTML log files. Or put XML and
non-XML stuff in different files.
Problem of keeping a continuously updated file (e.g. log file)
We need style! XSL can also do processing, such as selecting parts
of file relevant to particular context. Cf summary logfiles.
What do we do now?
We have been thinking about this for some time, see
We can't make anything public yet (waiting on Netscape), but probably
time to do something internally.
- Decide on area of interest (first section above).
- Write draft DTD/schema.
- Decide on best way to read and write data files.
Last modified: Mon Sep 18 10:17:48 BST 2000