CCP4 HAPPy Design Notes
These documents describe our functional goals and some implementation features.
Tasks are assigned importance from 1 to 10, where 10 is Vital.
CCP4 HAPPy is a transliteration and re-implementation of Chart in an object-oriented form. HAPPy should address SAD, MAD, MIRAS phasing senarios (with priority in that order). HAPPy will go well beyond Chart, particularly in that it will use PHASER and PIRATE and model-building, rather than the older programs that Chart used.
The starting point is post-TRUNCATE MTZ file(s). We presume that the data sets are not twinned or have been detwinned. We presume that we know the Laue group.
The intermediate and data passing will be formalized - particularly using classes and XML formatted data. Some Fortran programs have to be amended to provide data in XML.
High priority is to interface to SHELXD for the heavy atom location, but we must also allow use of RANTAN and ACORN - or some other program (actually substituting something else has Priority 3, but we must bear other programs in mind in the design of the interface).
We will work in the following order:
Fencepost 1 - Data preparation.
- Going from data spec to CADed data.
- Scaling and analysis. Scaling and analysis is something that PE should deal with. "Turn-key" scaling should be provided initially, and data analysis will follow after.
- Determine resolution limits.
Fencepost 2 - Heavy atom location.
- SHELXD preparation.
- Analysis of output.
Fencepost 3 - SAD phasing.
- Inclusion of PHASER in SAD mode (inclusion of further modes MAD/MIR/MIRAS later).
Fencepost 4 - Interpretable density map.
- Interpretable map (post-PIRATE).
- Analysis of map.
Fencepost 5 - Further phasing modes.
- PHASER for MAD/MIR.
Fencepost 6 - Model building.
- Include a model-building program (and refmac cycling?). This will require KDC and PE to do some work on this.
We should keep in mind that the output will be used for harvesting and certain elements of it should go into the CCP4 database.
We want to be able to assess the quality of the data and reject and signal useless observations as early as possible in the process.
We will probably need to incorporate some database functionality later. Maybe use something light like Berkeley DB. We should see what others use to avoid duplicating effort or adding yet another way of doing things.
In future HAPPy could have its own GUI and progress bar and dynamic graphs. But this is not important now [Priority: 2].
Initially we will launch from the command line using an XML data description file, either written programmatically or hand typed. This is described in the input file document.
Queueing and Parallelization
We need to think about what is the target platform.
Will it have a batch queueing system? Will it be a cluster?
Shall we run heavy atom search jobs sequentially (relatively easy to code), or have a SHELXD job running in the background and check its output occasionally (much better for 2-cpu machines, but more difficult to code for).
Multithreading in Python is a possibility for exploiting forthcoming dual-core cpu machines. This will be investigated [DJR].
As a start we will run things sequentially, but plan to multithread/parallelize this later.
Although the task is serial and suitable for functional programming, I found it convenient to have a class that provided derivative info (column labels for various programs, resolutions, scattering factors, etc). I suggest that we have something like that in HAPPy.
We should ensure HAPPy integrates well with CCP4/CPP4i, and employing existing packages/libraries where appropriate.
Input filesInput files.
Crystallographic informationCrystallographic information.
This is the first nitty-gritty job to be done - and as such may well get reworked from the initial implementation. Cadding (combining the datasets) is conceptually straightforward - but will involve some potentially tricky renaming of column labels.
Scaling and analysisScaling.
Heavy atom locationSHELXD.
Determination of handHand determination.
Communication with other programs
HAPPy is to be (more or less) a shell script. We would like to provide feedback to the user - for example, a solution progress bar, whether an encouraging heavy atom set has been found etc.
I can't see that the sort of graph that Geoff showed is appropriate for us. We have several different types of data to represent as the structure solution progresses. These can initially be trivially thrown up background (concurrent) loggraph processes.
However, we'd like a list of interesting graphs - that we can click on [which will then typically fire up loggraph]. This will involve a custom GUI [Priority: 4].
We'd also like to know what PyCHART is running "right now". Chart communciates with the (server) process from which it was started via a socket. I suggest that we do the same with HAPPy. So HAPPy will need to implement --host and --port command-line arguments.
Also, we'd like HAPPy to drive Coot. Similarly then, HAPPy will start a server port and start Coot with --host and --port commands. Coot will listen on that port and be driven by commands it sees there. [Priority: 5].
PE will implement these communication/server issues.
Heavy atom file spec
It may be necessary to design/use an XML description of heavy atoms. CRANK already has such a format which DJR will look at.
MT will provide a set of datasets, complete with a XML HAPPy input file. We will begin with around 20 datasets consisting of MTZs and HAPPy data description files.
We should set up a system where these jobs are run (overnight). This will be set up on DJR's machine.
It is useful (at least initially) to have the the svn/cvs server send us a mail for every commit - so we can review the work.
- Setup HAPPy mailing list. Done
- Setup SVN repository. Done.
- The project file (input data description) has been formalized and a parser written which fills the project class.
- A simple PDB parser for standard and some non-standard PDB files has been written. PDB file handling.
- MTZ handling. A Python MTZ container and parser has been written.
- TRUNCATE log file handling.
- Look at use of Berkeley DB XML for XML storage. Pickle Python objects with Gnosis?
- We need to use Emma.
|Top of page|