# Tutorials

## ccp4i2 tutorials

ccp4i2 video tutorials

Kevin Cowtan’s i2 quick start tutorial

#### Introduction

For this practical you are presented with a set of images collected from a crystal of seleno-methionine containing CD44. These images constitute a dataset which ultimately allowed the structure of CD44 to be solved, despite showing some unwelcome pathologies. We will work through the process of indexing, integrating and scaling these data and by the end of the practical you should know how to spot some common problems with a dataset and how to extract an optimal dataset from a collection of images.

Data previously collected on native CD44 was observed to crystallize in spacegroup P212121 with cell constants a=48.88, b=77.27, c=87.67, α=β=γ=90°.

This dataset may be downloaded here: 10.5281/zenodo.55993. You should unpack this archive into a directory where you have space (users at Diamond, beware the limited quota of your home directory!)

#### Getting Started with CCP4

1. Launch ccp4i2 by opening a terminal, typing “module load ccp4” and typing ccp4i2 at the command prompt in the terminal
2. You will be presented with a “Welcome” screen. Click on the link to “Start a new crystallography project”
3. In the “Name of project/folder” field, enter cd44. Click the “Select Directory” button and browse to the directory where you unpacked the archive
4. A “Project Viewer” window will open for project cd44

#### Autoindexing and Integration of Diffraction Data

1. Our first task is to inspect our diffraction images, index them and integrate these indexed reflections. Before we begin, however, it is always a good idea to inspect the images visually to assess the quality of the diffraction and pick up any possible warning signs of problems with the data.
2. To do this we will use the program iMosflm (you could also use dedicated image viewers such as adxv or albula). In the Project Viewer window, click on the small + symbol to the left of the “Integrate X-ray images “. You will now see links to open xia2, iMosflm and DIALS applications. To begin with we will launch iMosflm.
3. Double-click on the launch iMosflm control. A description of the job to be run will appear. We do not need to enter anything here, so just click the “Run” button on the right-hand end of the main toolbar.
4. iMosflm will be launched for you. You need to load some diffraction images, so select “Session > Add images”
5. Browse to the images subdirectory and select the image file cd44_3_2_001.img from the list and click “Open”. iMosflm will automatically recognise this as a set of related images and load all of them into the program.
6. Image cd44_3_2_001.img will automatically be displayed in an image window.
7. You will see a pattern of strong dark spots on a pale grey background. This is an encouraging start!
8. It is possible to zoom in on areas of interest in the image. Click on the magnifying glass icon and drag a box around a region of the image containing spots. You will see that the spots are roughly circular, are well separated from each other and are discrete, single spots rather than being split. These are all very good signs.
9. Now look at an image from near the end of data collection. From the drop-down list in the top left corner of the image window select image cd44_3_2_140.img.
10. Do you think the diffraction here is as good as in the first image? Can you see any features that have either appeared or disappeared since the start of the data collection? What might be the causes of these problems?
11. This suggests to us that our crystal has probably degraded during data collection, so we should be prepared to be wary of the data from the latter regions of the dataset later in processing.
12. We could continue with indexing and processing in iMosflm, but instead we will return to ccp4i2 and make use of the automated xia2/DIALS pipeline.
13. Close iMosflm using “Session > Exit”

#### Processing with xia2 -dials

1. In the Project Viewer, click on the “Task Menu” icon and then from the task menu double-click the xia2 icon to open a DIALS – xia2 job
2. You will need to fill in some information in this window to describe the job you want to run.
3. In the “Locate datatsets” section, use the “Image file” option to browse to the images subdirectory and select “cd44_3_2_001.img” again to load the image set.
4. There are other things we can choose to enter, but if we do not xia2 will attempt to determine them for us automatically by assessing the data. For “Heavy atom name” give “Se” for Selenium. Resolution limits and Spacegroup are normally determined by examining the data, but it is possible to enter them here if you have prior knowledge.
5. Click “Run” to start the job. It will take some time to search all of your images for diffraction spots, index them and integrate spots from your series of images and the progress of the job will be displayed in a text summary whilst it is running.
6. Once the job has finished you will be presented with a summary of indexing and scaling data. We typically inspect this output and, if necessary, re-run the scaling and merging process applying different cutoffs for resolution, image ranges included and possibly other parameters in order to optimise the quality of the reduced data we will use to solve our structure. Under the “Key summary” section there is a brief description of the properties of the dataset and a list of colour-coded notifications about possible data problems. What space group and cell parameters have been determined during data processing?
7. Lower down the Results report you will find plots of properties of this dataset plotted against both resolution and batch (or image).
8. Look for the plot of CC(�) v resolution. The plot of CC(�) v resolution plots the correlation coefficient for randomly selected half sets of the intensities in our dataset with each other. When this has fallen substantially below 1.0 it indicates that our data have become weak and internally inconsistent – it is a good idea not to include such data in our calculations. By default, this xia2 job has cut off our data at a resolution where CC(�) drops to 0.5. This is pretty reasonable (some people are less conservative and select a cutoff closer to 0.35). What has xia2 determined is the high resolution limit of our dataset based on this criterion?
9. Completeness describes the percentage of the theoretically observable reflections (within given resolution limits) we have measured in our experiment. What completeness is reported for this dataset, both overall and in the highest resolution shell?
We would expect to aim for a completeness of >96% for a dataset overall – there can be serious consequences for electron density map connectivity if data are incomplete, especially at low resolution. In this case, the completeness of the data is compromised because the crystal was capable of diffracting to a higher resolution than could be recorded by the detector, as positioned in this experiment. Look for the plot of completeness against resolution (you will find it under the “Other Merging Graphs” tab. At what resolution does the completeness of this dataset drop below 96%?
Because the high resolution reflections are still be useful in refinement, I would include all data from resolution shells that pass the CC(�) criterion mentioned above but I would comment when reporting the structure on the resolution at which completeness drops below 96%. Please note that this is my opinion and you will no doubt encounter people who disagree…
10. Look for the plot of Rmerge v Batch for all images (you will find it in the section on Analysis as a function of batch). Rmerge offers an indication of how well multiple observations of equivalent reflections agree with each other, with a low value indicating better agreement. You will notice that this value is stable throughout much of the dataset, but starts to drift upwards towards the end of data collection, indicating there is likely to be radiation damage to the crystal. It is better to remove these compromised data from processing as long as we can leave ourselves with acceptable completeness. The plot of cumulative %completeness v batch can help guide you in selecting this cutoff. For this dataset I would suggest that cutting off after batch 120 is a reasonable compromise.
11. Before we move on to re-run scaling and merging, we will take a moment to look at some of the diagnostic tools available within the DIALS suite to check the quality of indexing and integration and to troubleshoot any problems with our data.
12. DIALS image viewer
First we will take a look at the images and how DIALS is interpreting them. Launch the DIALS image viewer from CCP4i2 and select the “indexing” option from the DIALS json file for the xia2 job run previously (numbered accordingly).
13. Resolution rings can be toggled on and off from the settings window. Based on this, can you see why the completeness reported by the scaling process falls off so sharply with resolution?
14. The image viewer will launch, showing the first image of the dataset. Overlaid on the image is quite a lot of information determined during indexing, and integration including the positions of predicted and observed reflections, and the directions of the reciprocal lattice vectors. You can highlight interesting areas of the image by zooming in (with the mouse scroll wheel) and translating around the image (by clicking and dragging with the left mouse button).
15. Several interesting features can be toggled on and off from the Settings window. You will probably not be able to see much difference between them unless you zoom in quite a long way:
-The strongest pixel in each three-dimensional reflection is shown with a pink spot.
-The centre of mass of each spot is marked with a red cross. This is usually close to the peak pixel, but slightly offset as the centroid algorithm allows calculation of the spot centre at a better precision than the pixel size and image angular “width”.
-Strong pixels marked as part of a peak are shown with red dots.
-The reflection shoebox is shown with a blue border. This is the smallest three dimensional box that can contain the continuous peak region.
If any of these markers disagree in a significant way from the diffraction pattern in the image it is a likely indicator that there are problems with indexing.
16. It is also possible to use the image viewer to investigate what DIALS is treating as a spot during spot finding. This can be very useful when trying to index weak data, especially if it recorded on an older detector such as a CCD.
17. Uncheck all of the spot markers from the Settings window. Zoom out so that you can see the whole of the image. For this exercise, it will be easier to view the image if you select “invert” from the color scheme menu.
18. At the bottom of the Settings window are a series of buttons allowing you to see the image as it appears following various types of image processing during the spot finding process. In particular, the “threshold” button will show (in white against black) what is being treated as a spot during spot finding. By adjusting the value of “Global Threshold” you can remove weak “noise” features from spot finding. Try setting this value to 100. What features disappear from the pattern? Do you think this would help with indexing?
19. Close the image viewer.
20. DIALS recipricol lattice viewer
A second very useful tool for investigating problems with strong spot positions is the reciprocal lattice viewer – this displays the strong spots in 3D, after mapping them from their detector positions to reciprocal space.
21. Launch the reciprocal lattice viewer using the icon in ccp4i2. Select the same “indexing” json file option.
22. From the row of buttons marked all, indexed and unindexed select “indexed”
23. As you rotate the display around by dragging with the left mouse button, you will find orientations that show the periodicity of the reciprocal lattice.
24. It is possible to toggle on and off both the rotation axis of the experiment and the main beam vector.
25. Now let’s look at the unindexed spots too. From the same row of buttons, select “all”. Can you see any pattern to the strong spots that have not been indexed as part of the lattice?
26. Close the reciprocal lattice viewer.

#### Data reduction

Now it is time to re-process our integrated data.

1. From the Task Menu, open the “X-ray data reduction and analysis” section and launch the “Data reduction – AIMLESS” task.
2. Fill in the required fields for aimless input. You will need to:
• Give your data a Crystal name and a dataset name
• Omit the most radiation damaged images from processing. Exclude batch numbers 121-140 (as discussed above)
• Select a high resolution cutoff of 1.8A (as discussed above)
• Run the AIMLESS job.
3. The job will report scaling and merging statistics – how do these compare to the statistics reported at the end of the XIA2 job?

#### Heavy Atom Substructure Determination using SHELXD

1. From the Task Menu in ccp4i2, open the “Experimental phasing” section and launch the “Automatic structure solution SHELXC/D/E phasing and building” task. We only want to do substructure detection so set the option “Start with ‘substructure detection’ and end with ‘substructure detection’.
2. Complete the input fields for the SHELX heavy atom finding task as shown in the figure.
• Under “Use data from job” select the AIMLESS job you have just run. If you have followed the practical as written this will be job number 3.
• For the Crystal contents, use the browse option to find and select the target sequence file, “cd44.seq”. It should be in the folder above where the images are stored. In the dialogue box that opens, select 2 copies in the asymmetric unit.
•  We are looking for 4 Se atoms in each asymmetric unit, so complete those two fields accordingly (if not done automatically).
3. The output of the SHELXD job can give us a good idea of how likely it is that the job has been successful.
• SHELXD plots the CC between observed reflections and those calculated from the substructure found by a given try. SHELXD only uses strong reflections in its calculations, but this calculation is carried out both for all reflections (CCall) and only for weak reflections (CCweak). For a successful solution, both values should be relatively high – you will normally see two populations in the CCall vs CCweak plot for a successful job, with failed solutions having low values for both and successful solutions having high values. By default, the solution with the highest CCweak will be selected.
• SHELXD will often fit some atoms to noise during this initial search and it is a good to remove these potentially spurious sites at this stage as they can confuse our attempts to refine our substructure model.
4. From the task menu, open the “Coordinate data tools” section and launch the “Edit PDB/CIF files by hand” tool.
5. From the section “Use data from job” select the SHELXC/D job you have just run. If you have followed the practical as written this will be job number 4.
6. The section marked “Atomic model” should now be automatically populated with the SHELXD HA sites coordinate file.
7. Run the job – this will launch an editor window.
8. You will see that each atom in the file is parameterized here by: Atom number, atom name, residue name, sequence number, x,y & z coordinates, occupancy, B-value and element.
9. Look at the values in the occupancy column. It is typical that in a SHELXD heavy atom search the value of the occupancy will drop off sharply once the program has found all of the strong sites and has started fitting weak sites to noise (we would normally expect the first site found to be set to an occupancy of 1.0, so we should not necessarily over interpret the drop off between sites 1 and 2). In this case the occupancy drops off between sites 4 and 5, so I would suggest deleting sites 5 and 6.
10. You can select an atom for deletion by selecting the number to the left of the atom record, as shown below. You can then delete the atoms by clicking the “Cut” button (this may be a scissors icon depending on your computer)
11. From the “File” menu, select “Save all atoms to i2 database”
12. Exit this job.

1. We will now use the program PHASER in order to calculate an initial set of phases based on the heavy atom positions found by SHELXD. PHASER will calculate a set of phases for both possible enantiomers and using the initial electron density calculated will attempt to complete the substructure by finding additional weak anomalous scattering sites.
• In the section marked “Use data from job” select the “Edit PDB/CIF files by hand” job run in section 5. If you have followed the practical as written this will be job number 5.
• In the section marked “Reflections” select the SAD dataset available.
• In the section marked “Provide sites, partial model or phases” ensure that “Partial heavy atom substructure” is selected.
• In the section marked “Partial HA model” select the sites from your editing above. These will be labelled “Coordinate editor output file”.
• For estimating asymmetric unit contents “Provide a full specification of the ASU content by sequence”. This should automatically load the ASU contents specified in the SHELX step.
• In the section marked “Cycles of heavy atom completion” enter 5
• In the list of “Elements to search for” add first S and then Se
• Run the Job
3. Look at the “Results” tab for the PHASER job that has just run. In the “Output Data” section you will see “Atomic model” entries for both the original hand and the reversed hand. Right-clicking on the icon for either of these models will bring up an option menu. Select “View>View as text” from this menu to view the coordinate file. How many sites were found in the original hand? How many in the reversed hand?
4. At this point we cannot differentiate between the two possible choices of hand for our substructure. If we had data collected at additional wavelength or an additional heavy atom derivative we could break this ambiguity. Since we are working with SAD data we need to proceed to break the hand ambiguity using density modification.

#### Density Modification using PARROT

We expect there to be two protein molecules in the asymmetric unit so we will tell PARROT to try to find non-crystallographic symmetry (NCS) in our substructure model and apply any operators it finds to the density modification.

1. Select the “Results” tab of the PHASER job you ran earlier (This may already be open).
• From the bottom of this tab, select “Density modification – PARROT”
• Give the job a title indicating what hand you are working on. For example “PARROT – original hand”
• In the section marked “Use data from job” select the PHASER job you have just run. If you have followed the practical as written this will be job number 6.
• In the section marked “Reflections” check that the SAD dataset is selected.
• In the section marked “Phases” select the “Phase estimates – original hand” from the earlier PHASER job.
• In the section marked “Select NCS information” select “NCS from heavy atom model”
• In the “Atomic model” selection now made available select “Sites – original hand” from the PHASER job you ran before.
• Select the “Basic Options” tab and change the “Number of cycles” option to “Normal with NCS”.
• Run the Job
2. Repeat section the above, but substitute the reversed hand everywhere you used the original hand
3. Once the jobs have finished running, in the “Job list” tab you will see that CCP4 has reported a FOM value next to each of the Density modification jobs. This is a figure of merit and is an estimate of how much confidence you can have in the phases resulting from this job, and therefore of the likely map quality. The higher this value is the better on a scale from 0-1 – we would hope for a value greater than 0.7 if the phasing has been successful.
• What FOM is reported for the original hand and the reversed hand? Which solution do you think is correct?
• Open the “Results” tab from the job you selected. There is a summary of the results from the density modification job at the top of this page. Parrot will report here whether it managed to find NCS in the heavy atom sites and whether these NCS operators resulted in reasonable correlations between NCS volumes. Was NCS successfully identified in this case?
Parrot also reports in this section the FOM at the end of the job and attempts to advise on whether or not the map is likely to be of sufficient quality for model building. What does it advise in this case?

#### Map Inspection in COOT

1. It is nice to be advised that our map is good enough for model building, but we would prefer to look for ourselves.
2. Before we do this, though, we would like to calculate an additional anomalous Fourier map that will help us in visualizing our anomalous substructure.
• From the Task menu open the “Reflection data tools” section and launch the “Calculate unusual map coefficients” task.
• Give the job a title, such as “Anomalous difference map”
• In the section marked “Use data from job” select the successful density modification job for the correct hand as selected above.
• You only need to complete the fields in the “First dataset” section – there is no second dataset for this map.
• In the section marked “Reflections” ensure the SAD dataset is selected.
• In the section marked “Phases” ensure that the phases for the correct density modification job are selected – this should be populated automatically.
• Check the box marked “create an anomalous difference map”
• Run the job
3. Now we can launch coot and inspect our maps.
• Select the successful density modification job from the two hands and open the results tab.
• Click the button marked “Manual model building – COOT”
• Most of the fields in this task will be filled in automatically for you with correct values, but we want to add in our anomalous difference map.
• In the section marked “Electron density maps” click on the “Show list” button.
• Click the “+” button to add a new map.
• The “Map coefficients” selection tool will now be highlighted in red.
• From the selection drop-down menu, you will see that the newly created anomalous map is not present so click the “More data” option at the bottom of the list.
• From the list that appears, select the “Anomalous difference map coefficients” entry with a number corresponding to the map calculation job you carried out just now.
• Select this map by double-clicking on it.
• Run the job.
4. COOT will launch automatically at this point, with our heavy atom model and both of our maps all being displayed. We should spend a few moment organizing our display and our maps so that we can inspect them more easily.
• Expand the COOT window to a size that makes it easy for you to inspect the content of the graphics window – a window that fills about 2/3 of the screen normally works well for me whilst leaving enough room for any additional windows I might want to open.
• Open the “Display manager” from the toolbar at the top of the main window.
• The “Maps” section contains two maps. The first map is labelled “Map coefficients from density modification” and is our electron density map.
• Check the “Scroll” button for this map.
• You can adjust the contour level of this map using the scroll wheel of your mouse. As you adjust it you will see a value for the contour level displayed in the upper right corner of the display. Adjust this until the rmsd value is approx. 1.0.
• The second map is labelled “Anomalous difference map coefficients”.
• Check the “Scroll” button for this map.
• Adjust the contour level for this map until the rmsd value is approx. 4.0. Anomalous Fourier maps are typically very noisy, so we only want to interpret strong features.
• Click on the “Properties” button for this map.
• In the dialog that appears click on the “Colour” button and select a colour that provides a good contrast with your electron density map. I like to use a bright yellow, but this is in no way compulsory
• Close the “Properties” dialog by clicking OK.
5. Scroll around the map by using Ctrl-Left Mouse Click and dragging the map around. You should be able to observe relatively continuous areas of high density separated by low density solvent regions.
6. Select “Calculate > Map Skeleton”. Make sure that the map labelled “Map coefficients from density modification” is selected and turn on skeletonisation. This will plot a skeleton through continuous regions of high density. Is it possible to plot continuous regions of protein backbone through this map? Do you think you would be able to build a protein model into it?
7. When building a protein model into an empty map it can also be useful to look at an anomalous Fourier map that will have peaks at the position of any anomalous scatterers – in this case our seleno-methionines. If you have followed the practical as written above, strong peaks in this map will be displayed in yellow.
8. Scroll around the map using Ctrl-Left Mouse Button. Can you see the peaks corresponding to Se-Met residues? Do you see any other, weaker features in the anomalous Fourier map?  If so, what might they correspond to?
9. Close COOT by clicking “File > Exit”

#### Automatic Model Building with Buccaneer

1. The maps you have generated are probably good enough that the program Buccaneer will be able to have a very good try at building a model of cd44. If there is sufficient time left in the tutorial session you can run this program now and inspect the resulting model.
2. In ccp4i2, select the successful density modification job in the correct hand and open the “Results” tab. From the bottom of this page, click on the “Autobuild – BUCCANEER” button.
3. You should need to change very little in the input for Buccaneer – the correct sets of reflections, phases and the sequence should all have been carried forward from the density modification job.
4. Select the “Options” tab. Since we are solving the structure of a Se-Met derivatised protein you should check the box marked “Build methionines as selenomethionine”
5. Run the job
6. Once the job has run, open the “Results” tab.
7. The top of this tab is a brief summary of the results of autobuilding. How many residues has Buccaneer built? How many of these were assigned to the sequence? What are the reported R-factor and free-R factor from refinement of the autobuilt model?
8. From the bottom of the “Results” tab click the “Manual model building – COOT” button in order to inspect the output from Buccaneer.
9. Add your anomalous difference map to the COOT job as described previously.
10. Run the job to launch COOT and set up the maps as described before
11. Navigate to residue A44 using “Draw > Go To Atom”. How well do this residue and its surroundings correspond to both the electron density and anomalous Fourier maps?
12. Navigate to residue A34 using “Draw > Go To Atom”. Does what you see agree with your conclusions in the previous section?
13. Although autobuilding seems to have worked well, it is always necessary to check such a model carefully and ensure that model building is carried out correctly and to completion. Take a quick look around the model by Ctrl-left mouse dragging. Can you see any features of the model that need to be corrected?

Molecular replacement

#### Introduction

This practical is closely based on the tutorials written by Airlie McCoy, the author of PHASER and slightly modified to make use of the ccp4i2 GUI.

During the practical you will learn 1) how to use ensembling to construct a search model and 2) how to solve a heterodimeric complex 3) how to solve a homo-oligomer from a monomer

The data used in this tutorial can be obtained here:

#### Getting Started with CCP4 GUI2

1. Launch ccp4i2 by double clicking on the icon on the desktop
2. You will be presented with a “Welcome” screen. Click on the link to “Start a new crystallography project”
3. In the “Name of project/folder” field, enter MR. Click the “Select Directory” button and browse to the directory where you unpacked the data
4. A “Project Viewer” window will open for project MR

#### MR using ensemble search models: TOXD

α-Dendrotoxin (TOXD, 7139Da) is a small neurotoxin from green mamba venom. You have two models for the structure. One is in the file 1BIK.pdb, which contains the protein chain from PDB entry 1BIK, and the other is in the file 1D0D_B.pdb, which contains chain B from PDB entry 1D0D. 1BIK is the structure of Bikunin, a serine protease inhibitor from the human inter-α-inhibitor complex, with sequence identity 37.7% to TOXD. 1DOD is the complex between tick anticoagulant protein (chain A) and bovine pancreatic trypsin inhibitor (BPTI, chain B). BPTI has a sequence identity of 36.4% to TOXD. Note that models making up an ensemble must be superimposed on each other, which has not yet been done with these two structures.

1.1 Superimpose the two pdb files that will make up the ensemble

• Launch Coot from ccp4i2 tool bar or from Task Menu > “Model Building and Graphics” > “Manual model building – COOT”
• From the File menu, select “File > Open Coordinates” and browse to your project directory. Open both 1BIK.pdb and 1D0D_B.pdb
• Are the two structures superimposed? They will need to be in order for PHASER to construct an ensemble from them.
• We can use COOT to superimpose 1BIK on 1D0D.
• Select “Calculate > SSM Superpose”
• For the reference structure select 1D0D_B.pdb and for the moving structure select 1BIK.pdb
• Click “Apply” and you should see the two molecules superimposed on each other.
• Now save coordinates to ccp4i2 project MR
• In Coot menu bar select “File > Save mol to CCP4i2”; check that molecule “0 …” is selected; press OK
• In Coot menu bar select “File > Save mol to CCP4i2”; select molecule “1 …”; press OK
• In Coot menu bar select “File > Exit”
• You are now ready to carry out molecular replacement with an ensemble.

1.2 Run PHASER for Molecular Replacement

• From the “Task Menu” in ccp4i2, select the “Molecular Replacement” section and launch the “Expert Molecular Replacement – PHASER” task
• In the section marked “Reflections” you will need to import an mtz file since this project does not yet contain any reflections. Click the file browser icon and select the file toxd.mtz. You can accept the defaults you are offered when importing these data.
• In the section marked “Use Is or Fs” select F. It is often preferable to use Is, but this data file only contains Fs.
• We need to specify the Composition of the asymmetric unit of our crystal to the best of our understanding. We will do this by specifying a sequence file for TOXD.
• In the composition section, click the file browser icon and select the file toxd.seq. You can accept the defaults offered to you when importing the sequence.
• In the section marked “Search Model(s)” you will need to construct an ensemble from the two structures you superimposed in section 1.1
• Click on the button marked “Show list” to allow you to construct a search model from multiple coordinate files. Here you have the option to build one or more ensembles to use as search models, each of which will contain one or more atomic models.
• In the menu just beneath the “+” and “-” buttons, select “Coot output file 1”, type in sequence identity.
• Use “+” to “add structure in ensemble”. Select “Coot output file 2”, type in sequence identity.
• In the end, the section “Search model(s) should look similar to the image below
• Run the job

1.3 Inspect the Output from the Molecular Replacement Job

• Open the “Results” tab for the job you have just run.
• There are a number of nested lists containing a great deal of information about the job that has just run. Use this information and your knowledge of molecular replacement to complete the following tasks:
• Write down the steps of structure solution in the order in which they were taken. The section “Search strategy employed by PHASER” should be helpful here.
• Look through the log file to find the pieces of information listed in Table 1 at the bottom of this tutorial.
• Has PHASER solved the structure?

#### Solving a heterodimeric complex using MR: BETA/BLIP

β-Lactamase (BETA, 29kDa) is an enzyme produced by various bacteria, and is of interest because it is responsible for penicillin resistance, cleaving penicillin at the β-lactam ring. There are many small molecule inhibitors of BETA in clinical use, but bacteria can become resistant to these as well. Streptomyces clavuligerus produces beta-lactamase inhibitory protein (BLIP, 17.5kDa), which has been investigated as an alternative to small molecule inhibitors, as it appears more difficult for bacteria to become resistant to this form of BETA inhibition. The structures of BETA and BLIP were originally solved separately by experimental phasing methods. The crystal structure of the complex between BETA and BLIP has been a test case for molecular replacement because of the difficulty encountered in the original structure solution. BETA, which models 62% of the unit cell, is trivial to locate, but BLIP is more difficult to find. The BLIP component was originally found by testing a large number of potential orientations with a translation function search, until one solution stood out from the noise.

2.1 Consider the MR problem

• Import the reflection data for BETA/BLIP into your ccp4i2 project.
• In the “Task menu” open the “Import merged data,sequences,alignments or coordinates” section launch the “Import merged reflection data” task.
• In the section marked “Reflection data” click on the file browser icon and select the file beta_blip_P3221.mtz
• Provide a crystal name and a dataset name. These can be anything, but giving them systematic names is usually a good idea. I would suggest crystal1 and betablip
• What spacegroup is reported by the mtz file. Look in the input tab of the job you have just run.
• If this structure had not already been solved, would you know that this was the space-group? If not, what other space-group(s) must you consider? Consider handedness and possible enantiomorphs.

2.2 Run PHASER for Molecular Replacement

• From the “Task Menu” in ccp4i2, select the “Molecular Replacement” section and launch the “Molecular Replacement and refinement – PHASER” task
• In the “Use data from job” section, select the “Import merged reflection data” job that you ran in section 2.1
• In the section marked “Composition” you need to describe the likely contents of the asymmetric unit – in this case we expect to find one copy each of BETA and BLIP.
• In the “Composition” section click the “Show list” button.
• Click the file browser icon and select the sequence file beta.seq. Make sure that the “Number of copies in asymmetric unit” is set to 1 for this sequence.
• Click on the file browser icon and select the sequence file blip.seq. Make sure that the “Number of copies in asymmetric unit” is set to 1 for this sequence.
• You must now define the search model(s). This time we will be searching for one ensemble after another (each containing a single model) instead of using a single ensemble containing multiple models.
• In the section marked “Search model(s)” click on the “Show list” button. The first ensemble will be selected by default, but will be populated with an incorrect model (it will have assumed that we are still working with the same model as in section 1).
• Click on the file browser icon alongside the atomic model selection and select the file beta.pdb. In the sequence identity box enter 1.0
• Click on the “+” icon and select “Add ensemble” from the options presented.
• Select the model field for Ensemble 2. Click on the file browser icon alongside the atomic model selection and select the file blip.pdb. In the sequence identity box enter 1.0
• Select the first ensemble (the ensemble itself rather than the model it contains). Ensure that you will be searching for 1 copy of this ensemble and give the ensemble the name “BETA”.
• Repeat the last step for the second ensemble but give it the name “BLIP”

2.3 Inspect the Output from the Molecular Replacement Job

• Open the Results tab for the job you have just run. Use the information here and your knowledge of Molecular Replacement to complete the following tasks:
• Write down the steps in the structure solution in the order in which they were taken. How are these steps different from the TOXD example?
• Find the pieces of information listed in Table 1 at the bottom of this tutorial
• Which space group is the solution in? Which other space groups were tested (if any)? The first translation function search would be the best place to look for this.
• Why doesn’t Phaser perform the rotation function in the two enantiomorphic space groups?
• Which reflections in the data are particularly important for deciding the translational symmetry of the space-groups to search? Under what data collection conditions might you not have recorded these important reflections? Are there any other space-groups that you might want to consider when solving BETA/BLIP?
• How big is the anisotropic correction for the data? How does this compare to TOXD?
• Has PHASER solved the structure?

#### Solving a homo-oligomeric complex: HICA (If you have time)

Carbonic anhydrase is an enzyme that assists rapid inter-conversion of carbon dioxide and water into carbonic acid, protons and bicarbonate ions to aid removal of carbon dioxide from the blood in respiration. This ancient enzyme has three distinct classes; alpha, beta and gamma. Carbonic anhydrase from mammals belong to the alpha class, the plant enzymes belong to the beta class, while the enzyme from methane-producing thermophillic bacteria forms the gamma class. Members of these different classes share very little sequence or structural similarity. The alpha enzyme is a monomer and the gamma enzyme is trimeric. The beta enzyme can be a dimer, tetramer, hexamer or octamer. Haemophilus influenzae β-carbonic anhydrase (HICA,2a8d) is an allosteric protein. The model you have for this structure is E. coli β-carbonic anhydrase, which has 61% sequence identity to HICA. NB. This is a computationally demanding task so don’t worry if on your particular machine it fails or takes an unacceptably long time to run – try running the same task on a more powerful machine at a later time.

3.1 Consider the MR problem

• Import the reflection data for HICA into your ccp4i2 project.
• In the “Task menu” open the “Import merged data,sequences,alignments or coordinates” section launch the “Import merged reflection data” task.
• In the section marked “Reflection data” click on the file browser icon and select the file fast_2a8d.mtz
• Provide a crystal name and a dataset name. These can be anything, but giving them systematic names is usually a good idea. I would suggest crystal1 and hica
• From the Task menu, select the “Reflection data tools” section and launch the “Estimate cell content” task.
• In the section marked “Cell parameters taken from reflection data” select the reflections from the import data job you just ran.
• In the section marked “Calculate molecular weight from” click on the file browser icon and select the sequence file fast_2a8d.seq
• The Matthews calculation will be carried out automatically.
• How many monomers of β-carbonic anhydrase can fit in the asymmetric unit? Which of these possibilities is most probable? Which of these are possible? What are the oligomeric associations that could correspond with the possible asymmetric unit contents?  Consider the application of crystal symmetry.

3.2 Run PHASER for Molecular Replacement

• From the “Task Menu” in ccp4i2, select the “Molecular Replacement” section and launch the “Molecular Replacement and refinement – PHASER” task
• In the “Use data from job” section, select the “Import merged reflection data” job that you ran in section 3.1
• In the “Composition” section, select the fast_2a8d sequence. In the field for the number of copies in the asymmetric unit, enter the number of copies you think should be present in the asymmetric unit based on section 3.1
• In the “Search model(s)” section, you will need to define a single search model but tell PHASER to search for multiple copies of it.
• Click on the file browser icon and select the file fast_1i6p.pdb
• In the “Sequence identity” field enter 0.61
• Click on “Show list” and select the ensemble rather than the atomic model that it contains
• Tell PHASER to find the same number of copies that you entered in the “Composition” field above.
• Run the task (this job will take longer to run – this is normal when there are many copies to find)

3.3 Inspect the Output from the Molecular Replacement Job

• Write down the steps in the structure solution in the order in which they were taken.
• Find the pieces of information listed in Table 1
• Has Phaser solved the structure?
• How many molecules are there in the asymmetric unit?

Table 1

If you have time revisit the TOXD and BETA-BLIP examples and look at the following exercises. These should help you understand how the choices you made in the worked examples influence the outcome of Molecular Replacement.

4.1 TOXD

• Run PHASER again without using ensembling, using just 1BIK or 1D0D as a search model. What are the LLG values of the final solutions? What are the Z-scores of the translation functions? Was ensembling a good idea?
• Run PHASER again using the two pdb files before superposition as search models. What does PHASER report?

4.2 BETA-BLIP

Run Phaser again with the anisotropy correction turned off. What effect does this have on the structure solution?

Model building and refinement

#### Introduction

In this practical we will continue working with the CD44 experimental phases we determined in the MAD/SAD phasing practical. We will begin where the previous practical finished, by inspecting a CD44 model which has been automatically built by the program Buccaneer.

The data for this tutorial may be found at cd44.tgz

#### 1. Getting Started with CCP4 GUI2

1. Launch ccp4i2 by double clicking on the icon on the desktop
2. You will be presented with a “Welcome” screen. Click on the link to “Start a new crystallography project”
3. In the “Name of project/folder” field, enter cd44. Click the “Select Directory” button and browse to the directory where you unpacked the archive.
4. A “Project Viewer” window will open for project cd44

#### 2. Inspect Output from Automatic Model Building for Errors and Make Corrections

1. We will use the ccp4i2 project database to organize our data – normally this would already be populated with data from data processing and structure solution, but since we are starting part way through the process for the purposes of the practical we will begin by importing data into ccp4i2.
• From the “Task Menu” select the “Import merged data, sequences, alignments or coordinates” and launch the “Import” task. Browse to the correct directory and select the file cd44_bucaneer.mtz. Run the task.
• From the “Task Menu” select the “Import merged data, sequences, alignments or coordinates” and launch the “Import a coordinate set” task. Browse to the correct directory and select the file cd44_bucaneer.pdb. Run the task.
• In the section marked “Output Data”, right-click on the atomic model icon and select “View > View in COOT”. If you are prompted about nomenclature errors, just click Yes.
• From “File > Auto Open MTZ…” open the file cd44_buccaneer.mtz
• Use the scroll wheel of the mouse to change the contour level of the electron density map. Scroll to a value near 1.0 rmsd (this value is displayed in the top right corner of the graphics window).
• Click on the “Map” button in the top right-hand corner of the graphics window and select the “FWT PHWT” map for use in refinement.
• 1.1.5 Click on the “R/RC” button in the top right-hand corner of the graphics window to open the Refinement and Regularization control panel. Under “Weight Matrix”, set the Refinement Weight to 20.
2. Automatic model building will only rarely produce a model which is both complete and correct. It is helpful to compare the known sequence of CD44 with the model generated by ARP/wARP.
3. Select “Validate > Alignment vs. PIR…” and choose cd44_buccaneer.pdb as the model. Then choose to link chain A and the file cd44.seq (you may need to browse to the project directory). A “Residue Mismatches” panel showing residues present in the sequence file but different or absent from the current model will be generated.
4. From the “Residue Mismatches” panel, select “Mutate A 2 UNK to Ala”
5. Inspect the map at this point. Since Buccaneer has built an Ala residue for any unknown residues not docked with the protein sequence during model building (and marked them UNK) there is no need to add any atoms at this point. We can use the “Simple Mutate” tool (the icon  on the toolbar) to tell coot that this residue is in fact an Ala.
6. From the “Residue Mismatches” panel, select “Mutate A 21 UNK to Asn”
7. Inspect the map again at this point. It should be quite clear where the side chain of Asn 21 should be placed. Use the “Mutate & AutoFit” tool () to mutate UNK 21 to Asn.
8. The small loop from A 22 to A 25 has proved more difficult to build automatically (it looks like auto-tracing has followed a side-chain rather than the main-chain at one point). Fortunately, the map in this region look quite good so we will try to complete this region of the model using the loop fitting tools in coot.
9. Use the “Delete Item…” tool () to remove the range of residues from A 22 to A 25. You can do this either by deleting one residue at a time by deleting “Residue/Monomer” or remove them all at once using “Delete Zone” and clicking on the first and last residues to be deleted.
10. Check the fit of residue Asn 21, paying special attention to the position of its carbonyl oxygen.
11. From the “Calculate” menu, select “Fit Loop… > Fit Loop by Rama Search…”. Make sure that molecule “cd44_buccaneer.pdb” and chain “A” are selected. Enter residue numbers 22 and 25 as the beginning and end of the region to be built. The sequence for the loop to be built is “GRYS”. Click the “Fit Loop” button and watch as coot build your loop for you.
12. The model now looks like it fits the map a great deal better, but there is still some room for improvement so we will carry interactively refine the region we have just built. Use the “Real Space Refine Zone” () tool and click on residues immediately before and after the loop we have just built – residues Asn 21 and Ile 26 would be good ones to select.
13. At this point a putative refined model will be displayed with carbon atoms shown in white and a panel of ‘traffic light’ indicators will appear indicating the quality of model geometry in the refined region. Are you happy with the position the putative refined residues have adopted? Do the traffic light indicators all show green, signifying good model geometry? If so, accept the refinement and continue. If not, try to improve the model, perhaps with help from a demonstrator.
14. From the “Residue Mismatches” panel, select “Mutate A 93 UNK to THR”
15. Inspect the map at this point. To me, it looks like residue A 93 has been somewhat misplaced but residues A 94-96 are placed well in the map. We can trust their placement in this density sufficiently to trust that there are no residues incorrectly missing or added and we can therefore extrapolate the sequence back from A 97 Asp.
16. Use the “Delete” tool to remove residue A 93.
17. Coot allows us to mutate a range of residues at once, and can attempt to fit the newly placed sidechains in density automatically. From the “Calculate” menu, select “Mutate Residue Range”.
18. Make sure that molecule “cd44_buccaneer.pdb” and chain “A” are selected. Enter 94 and 96 as the beginning and end of the range and the sequence “SQY”. Tick the box to Autofit mutated residues. Click the “Mutate” button. Once again, this could use a little improvement so use the “Real Space Refine Zone” tool on the range you have just mutated. It is usually a good idea to extend the refinement one or two residues beyond the region you have just built, so in this case residues 94 and 97 would be good beginning and ending points.
19. We are left with a model where residues A 92 and A 93 have not been built. Looking at the map in this region it is very difficult to see where the main chain should be traced, so we are better off leaving the residues absent. With luck, future refinement will improve the map sufficiently to allow these residues to be built.
20. From the “Residue Mismatches” panel, select “Insert A 152”
21. Inspect the map at this point. There does not seem to be any density to support adding any more residues to the C-terminus of the current model.
22. Use the “Go To Atom” tool () to return to residue 2 of chain A.
23. You can now progress along the polypeptide chain by pressing “space” to move to the next residue (N to C) and “shift+space” to return to the previous residue (C to N).
24. Work your way along the polypeptide backbone inspecting the fit between the model and the electron density map. You may attempt to correct any errors in the model using the “Auto Fit Rotamer” () and “Realspace Refine Zone” tools.
25. When you reach residue Asp 5 you will notice that the model fits the map very poorly. Use the Auto-Rotamer tool to correct the orientation of Asp 5.
26. When you reach residue Arg 11 you will notice that the model fits the map poorly. Try using the “Auto Fit Rotamer” and “Real Space Refine Zone” to fix this error. You will notice that the resultant model is still a rather poor fit with the electron density map.
27. Fortunately it is possible to intervene manually in cases such as this where the refinement has become trapped in a false minimum. Use the mouse to drag the refined model (the one with the carbon atoms displayed in white) into the electron density. You should find that the Arg sidechain will snap neatly into the map once you have dragged it in the right direction. As long as you are happy with the geometry of this refined model, accept the refinement. NB. It is also possible to drag individual atoms by dragging with Ctrl+left mouse button. In some cases this can be very helpful.
28. His 17 has been built without a side chain, but there is clear electron density present, so you can make use of the “Mutate & AutoFit” tool to place the sidechain.
29. Continue to work your way around chain A, fixing errors where you find them. You may find at least one error that the flip peptide () tool will help you fix.
30. When you have reached the end of chain A (or when the demonstrators tell you that you have used enough time on this part of the practical) continue to the next point.
31. Buccaneer did not do quite so well building the second copy of cd44 and has split it into two separate chains (B and C). Chain B contains most of the model, consisting of residues 22-151. Although they may contain some small differences, at this early stage of refinement it is reasonable to assume that the two chains are at least similar to each other, so we can use coot to copy our edited and improved chain A to provide a good approximation of the second molecule.
32. First we want to remove chain C, since it will be in the way.
33. Open the “Display Manager” and change the display of your model from “Bonds (Colour by Atom)” to “C-alphas/Backbone”.
34. Zoom out by dragging upwards with the right mouse button until you can see all of chain C (it consists of two beta-strands joined by a hairpin). Shift-left clicking on a residue will identify it, so you can make sure that you have correctly identified chain C.
35. Open the “Delete Item…” tool, select “Delete Zone” and click on both ends of chain C.
36. Now you can copy chain A onto chain B. From the “Extensions” menu, select “NCS > Copy NCS Chain…”
37. Make sure that molecule “cd44_buccaneer.pdb” and chain “A” are selected. Click “OK”.
38. Select “File > Save to ccp4i2”. If you accept the default values here, you will save your model as “cd44_buccaneer-coot-0.pdb”
39. Select “File > Exit” to quit coot.

#### 3. Refinement in Refmac5

1. We will now use the program refmac5 from the ccp4 suite to refine our corrected model against our reflection data. We will make use of the new NCS tools in the latest version of refmac5.
3. In the section marked “Use data from job” check that the job marked “Manual model building – COOT” is selected.
4. In the “Reflections” field, select the reflections from cd44_buccaneer.
5. In the “Free R set” field, select the Free R set from imported merged data.
6. The fields “Phases”, TLS coefficients” and “Reference model” may be left as “…is not used” although all can be useful in some circumstances. We will in fact use TLS refinement, but we will allow REFMAC5 to determine TLS groups automatically, which it will do by assigning one group per polypeptide chain.
7. Select the “Options” tab.
8. In the parameters section, tick the box to “Use TLS parameters”. The two new fields that appear can be left at their default values.
9. In the restraints section, tick the box to “Use non-crystallographic symmetry (NCS) restraints. The two new fields that appear can be left at their default values.
10. Run the job.
11. Open the Results tab for job you have just run. This will probably happen automatically.
12. At the top of this report is a table showing the statistics from the refinement job. The final stats are presented in a table and the change in these values is plotted by refinement cycle.
13. We would expect both R and Rfree to have fallen during a successful refinement. In addition, R and Rfree should not diverge from each other too greatly – this would be an indicator of over refinement. A difference of approximately 0.05 is a good rule of thumb, although this may vary with resolution.
14. It is also important to check that the model resulting from refinement conforms to expected protein geometry. The summary table at the end of the refmac5 log file lists final values for Bond Length and Bond Angle showing the rmsd from library values. The average rms for these values in the restraint library is listed earlier in the log file – it is 0.022 for bond lengths and 1.943 for bond angles. The values for your refined model should be lower than these library averages, ideally substantially lower. If this is not the case, you will need to re-run the refmac5 job. If the geometry is acceptable you can continue to section 4.
15. During a refinement job, refmac5 attempts to optimise the model against two separate targets – the experimentally measured structure factor amplitudes and prior knowledge of protein geometry. The weighting given to each of these targets during refinement is of critical importance to achieving a successful refinement, and the correct weighting can be very sensitive to data resolution. By default, refmac5 will attempt to automatically determine this weight but it will often require manual intervention for optimisation.
16. Under the Refinement Stats table in the Results tab you were inspecting before you will see reported the weight applied to the X-ray term during the refinement job you have just run.
17. In the Job List, right-click on the refinement job you have just run and select “Clone” from the menu.
18. In the “Options” tab, change the “Weight restraints” option from automatic to manual. Enter a value lower than that reported for the last job to tighten the restraints on geometry. I would suggest a possible value of 0.05, but the only way to arrive at a suitable value for a given dataset and model is to test possible values and inspect the output.
NB. This value will be very sensitive to both the resolution and quality of your reflection data. A lower value will lead to more tightly restrained geometry whilst a higher value will weight more heavily towards the experimental X-ray terms.
19. Once again, check the Results tab resulting from your refinement job. Do you think the statistics reported by refmac5 now indicate a more acceptable model?

#### 4. Validation

1. Inspect output from refinement and check quality of the resulting model using validation tools. When rebuilding and refining a protein model it is very easy to make small mistakes, particularly at low resolution. It is therefore very important to cross-check your protein model against the large body of prior knowledge regarding protein geometry. This is the process of validation.
2. From the bottom of this new Results tab, click on “Manual model building – COOT”. This job is automatically populated with the output from your refinement job, so you can simply Run the job.
3. Set up the maps and restraints as described in section 2.1.
4. Exactly what validation and editing are needed at this point will depend both on what editing you carried out in section 2 and exactly how your refinement job was run in section 3. Here are some suggestions – please note that you are unlikely to have time to fully rebuild and validate this model during the practical session, so try to fix no more than 5 problems using each validation tool in order to get a feel for the tools.
5. Open “Validate > Difference Map Peaks” The correct map and model should be selected by default. The default sigma level (5.0) is also sensible, so click “Find Peaks”. A list of peaks will then be generated – work your way down them, correcting problems as you find them. Don’t worry about adding solvent molecules at this point – we’ll cover that in points 9-12 below.
6. Open “Validate > Ramachandran plot” and select the current model. An interactive Ramachandran plot will be displayed, with any outliers shown in red. Click on any outliers you find – are there any problems with the model that you can fix? A hint – the flip peptide ()  tool may be useful.
7. Open “Validate > Geometry Analysis” and select the current model. A histogram plotting geometry for each chain in the model will be displayed, with small green bars indicating good geometry and large orange/red bars suggesting problems that need to be fixed. Are there any problems in need of attention?
8. Whilst examining the structure, you will have noticed that there are numerous peaks in the electron density map that would be well explained by ordered water molecules. Coot contains tools to help you add waters to the model.
9. Open “Calculate > Other Modelling Tools > Find Waters”
10. The default values in the find waters dialog are acceptable for our purposes, so click “find Waters” to proceed.
11. The new waters have been added to chain D, and it is important to inspect them and make sure that they have all been added in appropriate places. Use the “Go To Atom” tool and select residue 1 in chain D.
12. Does this water molecule look correct? If so, hit the ‘space’ bar to move onto the next water. If not, either move or delete the water molecule as required.
13. Upon correcting as many errors with the model as possible, you would once again save your coordinate model and use it as input for a further round of refinement, repeating successive rounds of rebuilding and refinement until no further errors need to be corrected. If you have sufficient time, carry out another round of refinement as described in sections 2 and 3. Have the R and Rfree values improved relative to those observed in section 3 as a result of your editing?

### External tutorials

More MX tutorials

EPS-CCP4 school
http://legacy.ccp4.ac.uk/schools/APS-school/index.php

Phaser tutorials
http://www.phaser.cimr.cam.ac.uk/index.php/Tutorials

Tutorials on Helmholtz Zentrum Berlin website
http://www.helmholtz-berlin.de/forschung/oe/em/soft-matter/forschung/bessy-mx/tutorial/index_en.html

The iMosflm tutorial
HTML and PDF

Tutorials distributed with CCP4:
$CCP4/examples/tutorial/html/index.html$CCP4/mr_tutorial_2006/mr_tutorial_first.html (example 1 in the Basic phasing collection)