Programme for CCP4 Study Weekend 2003

Experimental Phasing CCP4 Study Weekend, 3-4 January 2003, University of York
Scientific Organisers: Airlie McCoy, Neil McDonald

Friday 3 January
Session 1: Introduction / Overview
11:00 Welcome
11:10 Garry Taylor
St Andrews
The phase problem
Given recent advances in phasing problems, those new to protein crystallography may be forgiven for asking “What problem ?” As many of those attending the CCP4 meeting come from a biological background, struggling with expression and crystallisation, this introductory lecture aims to introduce some of the basics that will hopefully make the subsequent lectures penetrable. What is the “phase”in crystallography ? What is “the problem” ? How can we overcome the problem ? The lecture will emphasise that we can only discover the phase values through some prior knowledge of the structure. The lecture will canter through direct methods, isomorphous replacement, anomalous scattering and molecular replacement. As phasing is the most acronymic realm of crystallography, MR, SIR, SIRAS, MIR, MIRAS, MAD, BAD and SAD will be expanded and explained in part. Along the way we will meet some of the heroes of protein crystallography such as Perutz, Kendrew, Crick, Rossman and Blow who established many of the phasing methods in the UK. It is inevitable that we meet some basic mathematics, but this will done as gently as possible.
11:50 Randy Read
New ways of looking at experimental phasing
In the original work by Blow and Crick, experimental phasing was formulated as a least-squares problem. For good data on good derivatives, this approach works reasonably well, but we now attempt to extract more information from poorer data than in the past. As in many other crystallographic problems, the assumptions underlying the use of least squares for phasing are not satisfied, particularly for poor derivatives. The introduction of maximum likelihood (and more powerful computers) has led to substantial improvements. For computational convenience these new methods still make many assumptions about the independence of different measurements and sources of error. We have been looking at a more general formulation for the probability distributions underlying likelihood-based methods for both experimental phasing and molecular replacement phasing. In the new formulation, all the structure factors associated with a particular hkl are considered to be related by a complex multivariate normal distribution. When the appropriate assumptions are introduced (i.e. measurement errors and lack-of-isomorphism errors of different derivatives are independent, derivatives contain no common heavy-atom sites, isomorphous addition rather than isomorphous replacement), the general formulation reduces to current likelihood targets. But the new formulation makes the necessary assumptions more explicit, and points the way to improving phasing using both isomorphous and anomalous differences.
Session 2: Making Derivatives
14:00 Elspeth Garman
Heavy atom preparation (and introduction to the heavy atom databank)
Several of the standard methods of solving macromolecular structures involve making a protein crystal that is derivatised by an anomalous scatterer or heavy atom (MIR, SIRAS, MAD, SAD...). The theoretical methodology which underpins the extraction of phase information from such derivatives is widely available in the literature. In addition there are comprehensive sources of information on the chemistry of heavy atom compounds and the ligands with which they are known to interact.
Thus this contribution to the Workshop will aim to provide some information on the less well documented practical problems of first deciding on an overall strategy and secondly performing the physical manipulations involved in producing and then cryo-cooling heavy atom derivatives from native protein crystals. Ways to optimise the chances of isomorphous unit cells will be suggested. Methods of determining whether or not the heavy atom is bound will be discussed, including the powerful technique of PIXE (Particle-Induced X-Ray Emission).
An introduction will be given to the Heavy Atom Databank. This is a database of the heavy atom compounds and conditions used to derivatise known protein structures which has been assembled from the literature.
The various considerations when making heavy atom derivatives will be illustrated with examples from our own laboratory.
14:30 Richard Kahn
A new class of lanthanide complexes to obtain high phasing-power heavy-atom derivatives for macromolecular crystallography
The current emphasis on high-throughput crystallography leads to develop heavy-atom preparation methods that are more reliable and less disruptive than traditional heavy-atom soaking. Seven gadolinium complexes have been tested and found to be excellent candidates to obtain heavy-atom derivatives in macromolecular crystallography.
These highly soluble lanthanide complexes can be easily introduced at high concentration (100 mM or higher) in protein crystals either by soaking or by co-crystallisation, without changing significantly the crystallisation conditions, as was already demonstrated for Gd-HPDO3A derivative crystals of hen egg-white lysozyme [Girard et al., Acta Cryst. (2002), D58, 1-9]. These complexes, combined to the Single-wavelength Anomalous Dispersion (SAD) method, are of special interest for high-throughput macromolecular crystallography.
Using this new class of heavy-atom derivative crystals, de novo phasing by the SAD method has been carried out on several proteins of known structures as well as of unknown structures. Diffraction data have been collected either with a laboratory source making use of the high anomalous signal (f" = 12 e-) of gadolinium for CuKα radiation, or using synchrotron radiation at the white line in the gadolinium LIII absorption edge (λ= 1.711Å, f" = 28 e-). br> Using Gd-HPDO3A, one of these gadolinium complexes, we have determined the structure of a chimeric form of OTCase from P.aeruginosa and E.Coli, which is a dodecameric protein of 450 kDa.
14:50 Gwyndaf Evans
Global Phasing
Tri-iodide derivatization of macromolecules
Methods for producing macromolecular derivatives using cryo-soak techniques with triiodide solutions are described. The methods have been tested on six different protein types. SAD/SIRAS phasing has been attempted for each protein with data measured using conventional Cu-Kα x-ray equipment and long wavelength synchrotron radiation. The results show varying degrees of success. Refinement of all six derivative structures has shown that iodine is able to bind as I- (as observed in standard halide soaks) and also as the polyiodide anions I 3- and I5-. The various species are able to bind through hydrogen bond interactions and to more hydrophibic regions of the protein at surface pockets and in inter- and intra-molecular cavities. On the whole the derivative displays a promiscuous behaviour in terms of its binding to proteins and is capable of generating sufficient phasing power from in-house Cu-Kα data to permit structure solution by SAD. The resuls of the phasing experiments and structure refinements will be presented.
15:10 Michael Quillan
Generation of noble gas binding sites for phasing using mutagenesis
The utility of noble gases for phase determination has been limited by the lack of naturally occurring binding sites in proteins. Wild-type T4 lysozyme contains one such binding site. By mutating large hydrophobic residues to alanine, additional noble gas binding sites were successfully introduced into this protein. Using data from xenon derivatives of wild-type, two single mutants, and the corresponding double mutant, experimental phases for T4 lysozyme were determined using standard MIR techniques. These phases, which were obtained from room-temperature data collected on a rotating-anode source, are comparable in quality to phases calculated using selenomethionine MAD on frozen crystals at a synchrotron. In addition, this method of introducing noble-gas binding sites near specific residues should provide useful information for determining the register of amino acids within electron-density maps.
Session 3: Data Collection
16:00 Ana Gonzálaz
Optimizing data collection for structure determination
The final purpose of diffraction data collection is to produce data set which provides enough structural information about the molecule of interest. This usually entails collecting a complete and accurate set of reflection to as high resolution as possible. In the practice, the characteristics of the crystal and properties of the x-ray source can be limiting factors to the data set quality that can be achieved and a reasonable strategy has to be used to extract the maximum amount of information from the data whithin the experimental constraints.
In the particular case of data intended for phasing using anomalous dispersion, the synchrotron beamline properties are relevant to determine how many wavelengths (one or more) should be used and what the wavelength values should be. In general, Multiwavelength Anomalous Dispersion (MAD) experiments produce very accurate phases , but are very demanding in terms beamline spectral range, easy tunability, stability and reproducibility. When these contidions cannot be, single wavelength experiments may be a better option.
In addition, understanding the effect of crystal characteristics, diffraction quality, anomalous scattering properties on the phasing is critical to balance the benefits of increased phase accuracy resulting from long exposures and redundant measurements with the increased risk of radiation damage to the crystal during the experiment.
16:40 Joe Ferrara
Extension of Home Laboratory Phasing Capabilities Using Chromium Radiation
A home laboratory high-intensity chromium X-ray source appears to be ideally suited for use in enhancing the weak anomalous signals from sulfur, selenium, calcium and other atoms found in protein crystals. Specifically, the f" for sulfur is 1.14 electrons at CrKα which is similar to the f" of calcium and selenium collected at CuKα. Since calcium anomalous scattering has been used to phase trypsin [1] from CuKα diffraction data, we expect a high quality CrKα data set to provide even more phasing power and allow for routine phasing of macromolecular diffraction data without the need for synchrotron data or isomorphous replacement.
In order to test this hypothesis we have commissioned Osmic, Inc. to design and manufacture a Confocal Max-Flux™ optic optimized specifically for Cr radiation. We have performed experiments with this optic using an RU-300 generator and R-AXIS IV image plate in order to determine how to maximize the anomalous signal from sulfur and other light elements within different proteins with the goal of solving the phase problem more readily. Special attention will be given to the details required to offset the strong absorption of Cr radiation and decay caused by Cr radiation by the experimental setup typically used for diffraction studies. We will also discuss the solution of protein structures using only the Cr radiation enhanced anomalous signal from sulfur.
17:00 Eleanor Dodson
Jolly SAD
If rwo measurements of an Xray amplitude are available, with a description of their vector difference then it is possible to calculate an estimate of the phase of that amplitude, along with its reliability. This is the case for the methods known as " Single Isomorphous Replacement" ( SIR) where the vector difference is due to heavy atoms added to the crystal and for "single Anomalous Dispersion" ( SAD) where the vector difference is due to the anomalous scattering of a few atoms in the lattice, eg Se in Se Methinine or a bound metal. The phase estimate can be refined by imposing prior knowlege of the appearance of a map for a macro-molecule; eg a considerable fraction of the asymmetric unit will be filled with disordered solvent, the map should show continuous features, etc.
These techniques work very well providing the initial measurements are reliable enough first to position the anomalous scatteres then to give realistic figures of merit. In some cases the relative signal from the anomalous scatters has been as low as 1%.
I will examine several test cases which illustrate both success and failure to pinpoint the factors which govern the outcome.
17:20 BC Wang
Practical Aspects of Sulfur ISAS Phasing
Sulfur, which exists in almost all proteins, has been investigated as an anomalous phasing probe for protein structure determination since the early 1980s. However, during the past two decades only a few de novo structures have been determined using this method. This is due to the fact that sulfur's anomalous scattering signal is weak, with ΔF" ranging from 0.124 to 1.42 e- over the tunable range of most synchrotron X-ray sources. Using third generation synchrotrons, improved CCD detector technology and special attention to data collection we have shown that the weak sulfur anomalous scattering signal can be recorded with the accuracy needed for successful de novo structure determination. Based on these successes we have developed data collection, data processing and phasing procedures for protein structure determination using sulfur single-wavelength anomalous scattering (SAS) data. Our results show that sulfur phasing can be applied to many crystal structure determinations if the correct experimental procedure is pursued. The practical aspects of this approach will be presented.
17:40 Raimond Ravelli
Radiation-Damage Induced Phasing
Our brightest synchrotron sources have had an extraordinary impact in structural biology during the past few years. However, they are also creating havoc with crystalline biological samples by processes referred to as radiation-damage. In the course of the data collection, the diffraction power of the crystal is reduced, the mosaicity and overall B-factor go up, and eventually one will lose all higher order reflections. In addition to these general effects, some highly specific changes might occur, such as breakage of disulphide bonds and loss of definition of carboxyl groups. During data collection, the diffraction intensities change and these changes can be measured accurately. Part of these changes can be assigned to an in general large, but limited number of highly susceptible sites. In this paper it is shown how to extract reliable intensity differences of the X-ray susceptible part of the structure. These can eventually be used to obtain phase information by a method that we have named Radiation-damage Induced Phasing (RIP).
Saturday 4 January
Session 4: Finding the Sites
09:00 Ralf Grosse-Kunstleve
Heavy atom searches and their symmetries
This presentation gives an overview of the various heavy atom search procedures that are available to the macromolecular crystallographer including Patterson methods and Direct methods. To help understanding potential pitfalls special emphasis is put on elucidating the symmetries of the search spaces.
09:30 Frank von Delft
A Very Large Substructure: The 160 Seleniums of KPHMT
We present the largest successful application of selenomethionine MAD reported to date: the crystal structure of the decameric E.coli enzyme ketopantoate hydroxymethyltransferase (KPHMT), with 160 ordered selenium atoms and 560 kDa of protein in the asymmetric unit. Despite small (< 150 µm), irregular, weakly diffracting (< 3.2 Å) crystals, the substructure was solved by SAD combine d with Direct Methods, using a 20-fold redundant “peak” dataset. SnB produced the first correct solution after 2600 computing hours , and phases from SHARP and Solomon produced traceable maps, even before 20-fold NCS averaging. Subsequent analysis revealed that while data redundancy was critical for success, careful selection of data was even more so; on the other hand, speed and success rate vary considerably between programs. Apart from a favourable ratio of selenium to scattering matter, the procedure was quite general, suggesting that this is still a long way from the practical upper limit of applicability, if that exists.
09:50 Gabby Rudenko
Structure determination of the extracellular domain of the LDL Receptor: a non-trivial case of MAD phasing
We have solved the structure of the extracellular domain of the LDL receptor at 3.7Å resolution. A MAD experiment was carried out on our crystals at the tungsten edge. The asymmetric unit of our crystals contained one protein molecule and 31 tungsten atoms arranged in 2 1/2 clusters. The MAD phases led to an interpretable electron density map in which known fragments could easily be placed. In diffraction experiments using energies at the tungsten edge and above, the presence of so many anomalous scatterers together with only 85 kDa protein, generated a tremendous anomalous signal. While in principal useful for phasing, such a large anomalous signal proved in practice problematic for data processing and structure determination. In addition, the poor quality of the crystals and their radiation sensitivity hindered progress. We will describe the methods used that ultimately led to structure determination of the extracellular domain of the LDL Receptor.
10:10 Thomas Schneider
Determination of accurate substructures with SHELXD
Using the signal of naturally built-in or artificialy introduced anomalous scatterers to derive a starting phase set in a macromolecular crystal structure determination has become routine in recent years. In particular in the contect of high- throughput crystallography, MAD and SAD (multiple and single wavelength anomalous dispersion) methods are central tools. For both techniques, a crucial step is the determination of the substructure of anomalous scatterers.
Due to the molecules investigated becoming larger and the use of soaking techniques that may populate many sites, the size of typical substructures is increasing and classical direct methods and Patterson techniques have hit their limit. Although originally designed for the ab initio phasing of entire macromolecular structures, real/reciprocal Fourier recycling methods combined with Patterson-based seeding as implemented in SHELXD[1] prove very effective in finding substructure sites.
Choosing the right subset of the diffraction data for the substructure determination can make the difference between success and failure. For example, the inclusion of noisy high resolution data will generally do more harm than good. Furthermore, subsequent phasing procedures such as SHELXE[2] will profit from starting with a substructure model that is as accurate as possible.
Using a computer program that allows the quantitative comparison of different substructure models (SITCOM[3]), we investigated how the most accurate sustructure can be obtained under different circumstances. The results of this study for different scenarios in MAD and SAD phasing will be discussed.
Session 5: Twinning
11:00 Simon Parsons
The Derivation of Non-Merohedral Twin Laws During Refinement by Analysis of Poorly-Fitting Intensity Data
Data sets from non-merohedral twins contain large numbers of reflections that are unaffected by twinning it is our experience that their structures can be solved with out difficulty. Problems such as large, inexplicable difference peaks and a high R-factor may indicate that twinning is a problem during refinement. Careful analysis of poorly fitting data reveals that they belong predominantly to certain distinct zones in which |Fobs|2 is systematically larger than |Fcalc|2. If twinning is not taken into account it is likely that these zones are being poorly modelled, and that their indices may provide a clue as to a possible twin law. We have written a computer program, called ROTAX, which makes use of this idea to identify possible twins laws. A set of data with the largest values of [Fo2 - Fc2]/σ(Fo2) is identified and the indices transformed by two-fold rotations about possible direct and reciprocal lattice directions. Matrices which transform the indices of the poorly fitting data to integers are identified as possible twin laws. The user then has a set of potential matrices which might explain the source of the refinement problems described above. If area detector intensity frames are available, then the current orientation matrix may be transformed in an attempt index previously unindexed spots. Alternatively the twin laws can be used to split affected zones of reflection data and a check made to see if this improves the refinement statistics.
11:40 Zbigniew Dauter
MAD phasing on merohedrally twinned crystals
Merohedrally or pseudomerohedrally twinned protein crystals may occur more often than usually acknowledged. Often such crystals are discarded without further analysis as “difficult without reason”. Several structures have been solved from twinned crystals by Molecular Replacement and MIR, but not so many by MAD. The crystals of gpd were twinned at about 35 % level, but it was possible to solve the protein structure by the classic SeMet MAD approach. This case will be presented in detail and various procedures for the evaluation of the twinning ratio and detwinning discussed.
12:00 Anke Terwisscha van Scheltinga
MIR structure determination of Deacetoxycephalosporin C Synthase from twinned crystals
Crystals of deacetoxycephalosporin C synthase (DAOCS) were found to be twinned by merohedry by many diagnostic criteria, e.g. a distorted cumulative intensity distribution and additional symmetry in the self rotation function. We determined the structure of DAOCS from these twinned crystals, based on a combination of isomorphous replacement and the use of a multiple wavelength diffraction data set. To identify and use possible derivatives, we detwinned the data, applying twin fractions estimated from Britton plots. The detwinned data resulting from these estimations were accurate enough to interpret Patterson maps, to refine the sites and to obtain good MIR phases. The twin fractions were refined by using the phasing statistics as criteria. This accuracy did not result in a relevant improvement of the phases, showing that the values obtained from Britton plots were sufficient for structure determination. We found that merohedral twinning is not necessarily an obstacle for structure determination by MIR; even crystals with twin fractions as high as 0.45 can give phase information after detwinning. However, the use of crystals with lower twin fractions will result in smaller errors when detwinning, and will produce better experimental phases.
12:20 Dmitriy Alexeev
Twinning in presence of multiple NCS: metals in a bacterial transferrin
Merohedral twinning is usually detected and overcome by statistical methods, which rely on the assumption that the intensity probability distributions for the twin-related reflections are independent. They become correlated if a non-crystallographic rotational symmetry element coincides with the twinning symmetry. Presence of translational NCS seriously distorts the probability distribution itself and biases our twinning estimates. Both rotational and translational NCS complicated our twinned structure solution. Crystals of the Ferric Binding Protein (Fbp) contain 9 protein molecules in the asymmetric unit of the space group P32. Self-patterson map reveals two strong translational peaks. Six protein molecules are related by non-crystallographic dyad axis parallel to the twinning axis, which lies perpendicular to the crystallographic 3-fold axis and generates the P32 12 symmetry of the diffraction pattern. The structure was solved by the combination of molecular replacement and manual analysis of anomalous difference maps phased with the trial models from molecular replacement. The presence of heavy hafnium atoms bound to the protein provided strong anomalous signal, which ensured the structure solution. Interestingly, oxidized Hf+4 ions form clusters (from 3 to 5 metal atoms) in the iron-binding pocket of FBP.
Session 6: Refinement and Phasing
14:00 Gérard Bricogne
Global Phasing
Generation and flow of experimental phase information in structure determination: recent enhancements in SHARP 2.0
The SHARP program for heavy-atom refinement and phasing, described in its initial form in 1997, has been completely rewritten, New numerical methods, code restructuring and the use of a powerful new optimiser have produced considerable gains in speed and accuracy. These improvements have been accompanied by the development of a new representation of phase information better suited to its transfer from and towards other steps of structure determination and refinement.
14:40 Dominika Borek
Errors and sigmas when crystals are changing
Diffraction intensity measurements are obtained with some uncertainty, described by sigma values. The sources for uncertainties in measurements can be divided into two categories: systematic phenomena and random noise. Systematic phenomena are mainly associated with the crystal and instrument quality. Some of them, for example non-uniform crystal rotation, can be accounted for by multiplicative scale factors that apply to all reflections; others require separate corrections for each unique reflection (for example radiation-induced localized changes in the structure). If not corrected for, systematic phenomena, together with random effects, will be reflected in the final sigmas. Optimal treatment of all errors might be crucial in phasing from anomalous signal, where the anomalous signal is of similar magnitude as the uncertainty in the measurements.
Currently, there is no satisfactory theory for the consistent treatment of uncertainties in crystallography. We will discuss theoretical and practical problems, such as the role of personal bias, approximations and assumptions of error correction procedures, propagation of errors between different crystallographic procedures, and correlations between parameters. In addition, the impact of some of the systematic effects will be discussed with examples of uncertainty treatment in popular crystallographic programs.
15:00 Phil Evans
Phasing the AP2 complex with Xe, Hg and Se
The AP2 clathrin adaptor complex is a heterotetramer involved in the formation of clathrin-coated vesicles. Crystals of the 200kD core complex contain the “trunk” domains of the two large subunits α and β2, the medium µ2 and small σ2 subunits. This complex crystallises together with the lipid-headgroup mimic IP6 in spacegroup P3121, unit cell a = b = 122 Å , c = 258Å, γ = 120° with one complex in the asymmetric unit. The crystals diffract at best to about 2.6Å resolution, but the diffraction is weak beyond about 3.2Å (Wilson plot B-factor about 80Å 2).
The structure was solved using a series of Xe and mercury (EMTS) derivatives. The most important phasing came from two Xe derivative datasets, which were collected at long wavelength s to enhance the anomalous signal (f" = 9.0 at λ=1.74Å, f" = 11.1 at =2Å). SeMet protein could only be prepared in the presence of a partially rich culture medium, so the incorporation of Se was less than 100%. Phases from two Se datasets (each at two wavelengths) were used together with the Xe and EMTS datasets, but the improvement to the phases was small. The main value of the Se data was in aiding the model-building.
Xe derivatives are valuable phasing tools even for large structures, particularly with data collected at long wavelengths.
15:20 Andy Stewart
Can three-beam interference be an alternative to anomalous dispersion?
The recently developed, reference-beam diffraction data-collection technique, makes it possible to collect large numbers of relative phases (Triplet phases) of Bragg reflections. Here we will demonstrate the differences between the reference-beam diffraction method, the conventional three-beam interference techniques, and the standard oscillating-crystal method. With the reference beam technique it is possible to collect hundreds of phase-sensitive three-beam interference profiles on a time scale comparable to that of a multiple-wavelength anomalous dispersion (MAD) experiment. Experimental results and analysis will be presented, on how the triplet phases are obtained from measured reflections, and then how individual phases can be deduced to produce an electron density map based on the measured triplet phases.
15:40 Ditlev Brodersen
Phasing the 30S ribosomal subunit
During the last couple of years, high-resolution crystal structures of both subunits of the bacterial 70S ribosome have been determined which means that we now have a complete structural scaffold that can be used to ask new biochemical and structural questions to try and further understand the function of this important and complex enzyme.
In the case of the 30S ribosomal subunit from Thermus thermophilus, phasing went hand in hand with efforts on several fronts to push the resolution limit of the crystals to a level where individual protein side chains and RNA bases could be distinguished in the electron density. In spite of significant technical and methodological developments in the field of macromolecular crystallography over the last decade, the 30S case proved particularly difficult for a number of reasons including crystal variability and radiation damage. Crucial to the success of ab initio phasing was the use of single wavelength anomalous data collected at the LIII-absorption edges of relevant heavy atom compounds such as lanthanides and osmium complexes. Other important aspects of the data collection and phasing protocol included the use of crystal alignment, pre-screening of crystals, and the use of isostructural compounds to compensate for non-isomorphism between native and derivative crystals. Successful interpretation of the resulting electron density map was further aided by vast amounts of prior biochemical data on the composition and structure of the ribosome. A reexamination of the origin of the phasing signal has indicated that very weak anomalous signal is more important for successful phasing of very large complexes than one might initially expect.