autostruct AUTOSTRUCT - TEST DATA
AUTOSTRUCT AUTOSTRUCT - TEST DATA More test data Contact

At the Autostruct Coordination Meeting 4 in Amsterdam on 25-MAR-2002, the process to get a collection of test data together was set in motion. The data are to be kept on the Autostruct website for everyone (Autostruct, but also the wider community - Phenix for instance) to use. The test data will have various purposes:

We will endeavor to gather native as well as additional data (i.e. SAD, MAD, derivatives).

Test data already in use:

molecule name details, (potential) use spacegroup resolution (Å) data PDB code PDB SF code
ModE
  • Hall et al.
  • SHELX test data; MAD
  • reflection file contains _refln.F_meas_au and _refln.F_meas_sigma_au at 4 wavelengths
P21212 1.75 sfdata-mode.tgz
tar -xvzf ./sfdata-mode.tgz
to unpack into directory sfdata/

sfdata-mode.tgz contains mode-1b9m-std.mtz with columns FP SIGFP FC PHIC FWT PHWT DELFWT PHDELWT FOM

1B9M r1b9msf
JIA (Acyl-CoA Thioesterase II)
  • Li et al.
  • SHELX test data; MAD
  • reflection file at PDB contains _refln.intensity_meas and _refln.intensity_sigma
C2221 1.90 sfdata-jia.tgz
tar -xvzf ./sfdata-jia.tgz
to unpack into directory sfdata/

sfdata-jia.tgz contains:
  • jia_readme.txt with some data collection information
  • jia.hkl with native S-Met data
  • jia_hrem.sca, jia_peak.sca, jia_infl.sca, jia_lrem.sca with Se-Met MAD data
  • jia_1c8u.pdb with S-Met model at PDB
  • jia-1c8u-std.mtz with columns FP SIGFP FC PHIC FWT PHWT DELFWT PHDELWT FOM
  • 1C8U r1c8usf
    RRF
    • Selmer et al.
    • SHELX test data
    • reflection file contains _refln.F_meas_au and _refln.F_meas_sigma_au
    P43212 2.55 sfdata-rrf.tgz
    tar -xvzf ./sfdata-rrf.tgz
    to unpack into directory sfdata/

    sfdata-rrf.tgz contains rrf-1dd5-std.mtz with columns FP SIGFP FC PHIC FWT PHWT DELFWT PHDELWT FOM

    1DD5 r1dd5sf
    TransH P21 2.00   1F8G  
    Cyanase
    • Walsh et al.
    • SHELX test data; MAD
    • 30Se
    • reflection file at PDB contains _refln.F_meas and _refln.F_meas_sigma
    P1 1.65 sfdata-cynsemet.tgz
    tar -xvzf ./sfdata-cynsemet.tgz
    to unpack into directory sfdata/

    sfdata-cynsemet.tgz contains:
  • cynsemet.mtz with native data to 1.65Å (collected at 1.0332Å) and 4 wavelength MAD data to 2.4Å (wavelengths: 1 = low-energy remote = 1.0781Å, 2 = inflection point = 0.9795Å, 3 = peak = 0.9793Å, 4 = high-energy remote = 0.9465Å) with columns DANO and F+/F-
  • cynsemet-1dw9-std.mtz with columns FP SIGFP FC PHIC FWT PHWT DELFWT PHDELWT FOM
  • 1DW9 r1dw9sf
    Thaumatin
    • Debreczeni et al.
    • SHELX test data; SAD,SIRAS
    • reflection file at PDB contains _refln.F_meas_au and _refln.F_meas_sigma_au, data collected in 1994 to 1.75Å resolution
    P41212 1.55 sfdata-thau.tgz
    tar -xvzf ./sfdata-thau.tgz
    to unpack into directory sfdata/

    sfdata-thau.tgz contains:
  • thau-nat.hkl with native data (1.55Å resolution, collected at 1.54Å)
  • thau-iod.hkl data for iodide derivative (2.0Å resolution, collected at 1.54Å)
  • thau-sad - SHELX command script for SAD
  • thau-siras - SHELX command script for SIRAS
  • 1THW r1thwsf
    Cubic insulin
    • Sheldrick
    • SHELX test data, synchrotron test data
    • very fast data collection possible to high resolution
    • 6 sulfurs can be found easily by SAD phasing
    I213 1.35 sfdata-cubins.tgz
    tar -xvzf ./sfdata-cubins.tgz
    to unpack into directory sfdata/

    sfdata-cubins.tgz contains cubins.sca with columns I and SIGI for centric reflections, and I(+)/SIGI(+) and I(-)/SIGI(-) for acentric reflections
       
    various other test data and tutorial material for SHELX on the SHELX Program Page, especially from the 'List of files on ftp site & CD'          
    RNAse Sa
    • Sevcik et al.
    • CCP4 test data; MIR, refinement
    • reflection file at PDB contains _refln.intensity_meas and _refln.intensity_sigma; coordinate file at PDB contains ANISOU
    P212121 2.5, 1.8 and 1.15 sfdata-rnase.tgz
    tar -xvzf ./sfdata-rnase.tgz
    to unpack into directory sfdata/

    sfdata-rnase.tgz contains:
  • rnase25.mtz - 2.5Å data for HG, PT and I derivatives (including DANO); fairly poor (old) derivative data, useful for MIR test/tutorial
  • rnase18.mtz - 1.8Å data for guanosine phosphate complexes; fairly poor (old) data, used for refinement tutorial for liganded proteins; columns FNAT SIGFNAT FreeR_flag F3GP SIGF3GP F2GP SIGF2GP
  • rnase115.mtz - 1.15Å data for refinement; columns FP SIGFP FC PHIC
  • 1LNI r1lnisf
    GerE
    • Ducros et al.
    • CCP4 test data; MAD
    • no SF data available at PDB
    • 2.75Å MAD data set with associated native to 2.15Å
    C2 2.15 sfdata-gere.tgz
    tar -xvzf ./sfdata-gere.tgz
    to unpack into directory sfdata/

    sfdata-gere.tgz contains:
  • gere_nat.sca with native data (collected at 0.870Å)
  • gere_hrm.mtz, gere_peak.mtz, gere_infl.mtz, gere_lrm.mtz with anomalous data (DANO and F+/F-) for 4 wavelengths (0.8865, 0.9793, 0.9795 and 0.9797Å respectively)
  • 1FSE  
    Toxd
    • Skarzynski
    • CCP4 test data
    • small data set
    • reflection file at PDB contains _refln.F_meas_au and _refln.F_meas_sigma_au
    • other data in CCP4 release:
      • $CEXAM/toxd/toxd.mtz with columns FTOXD3 SIGFTOXD3 ANAU20 SIGANAU20 FAU20 SIGFAU20 FMM11 SIGFMM11 FI100 SIGFI100 FreeR_flag
      • $CEXAM/tutorial/data/toxd.hkl with columns FOBS SIGMA
    P212121 2.2   1DTX r1dtxsf
    any from Jolly SAD, see esp. Table 1 and Table 2 see separate document on those
    Calmodulin
    • Wilson and Brunger
    • CCP4 test data; high resolution
    • High resolution data inc. PDB files with ANISOU records, quite rare in PDB as yet
    • reflection file contains _refln.intensity_meas and _refln.intensity_sigma
    P1 1.0   1EXR r1exrsf
    Crambin
    • Jelsch et al.
    • CCP4 test data; high resolution
    • High resolution data inc. PDB files with ANISOU records, quite rare in PDB as yet
    • reflection file contains _refln.F_meas_au and _refln.F_meas_sigma_au
    P21 0.54   1EJG r1ejgsf
    GAPDH
    • Isupov et al.
    • CCP4 test data; TLS
    • asymmetric unit contains 2 monomers each of 2 domains
    • reflection file contains _refln.F_meas_au and _refln.F_meas_sigma_au
    P41212 2.0 1b7g.tar.gz
    (Example TLS file)
    1B7G r1b7gsf
    mannitol dehydrogenase
    • Hoerer et al.
    • CCP4 test data; TLS
    • asymmetric unit contains 3 tetramers
    • reflection file contains _refln.F_meas and _refln.F_meas_sigma
    C2 1.5 1h5q.tar.gz
    (Example TLS file)
    1H5Q r1h5qsf
    HypF domain
    • Rosano et al.
    • Hydrogenase maturation protein HypF N-terminal (acylphosphatase-like) domain (HypF-ACP)
    • Images were collected in Hamburg on BW7A beamline with a MAR345 detector. Incident wavelenght was 0.9997 A, crystal to detector distance 250 mm, starting phi 77.0 delta phi 1.0. X beam and Y beam were (in DENZO format ...) respectively 120.010 and 119.601. Crystals belong to the H32 space group with unit cell parameters of a=b=58.5, c=156.1.
    • Solved by SIRAS with SHARP/SOLOMON and autobuilt. However, can we use the Hg derivative data as a SAD test case? This is a nice case because there are only 84 images, total dataset size 130 Mbytes.
    • reflection file at PDB contains _refln.F_meas and _refln.F_meas_sigma
    H32 1.27 hg_images.tar (127Mb!)
    data for Hg derivative
    1GXT r1gxtsf
    Thermolysin
    • Weiss et al.
    • Thermolysin (P6122) collected at 1.9Å wavelength to 1.83Å resolution, 360 degs each.
    • SAD case based on 1 Zn, 5 Ca, and 3 S (see Structure 9, 771-777, 2001) 315 amino acids. Autobuilt.
    • to be included later
    P6122 1.83      
    Xenon SAD
    • Phil Evans
    • Lab data (CuKalpha), single xenon site, solved by SAD...I'm still working on this.
    • to be included later
             
    2 wavelength MAD
    • Ashley Deacon SSRL - is putting this together
    • to be included later
             
    beta:BLIP
    • Strynadka et al.
    • beta-lactamase plus Beta-Lactamase Inhibitory Protein
    • anisotropy in the data complicates molecular replacement
    • 1jtg - BLIP from Streptomyces clavuligerus; 1jtd - BLIP-II from Streptomyces exfoliatus, both spacegroup P212121
    P3221 3.0 sfdata-betablip.tgz
    tar -xvzf ./sfdata-betablip.tgz
    to unpack into directory sfdata/

    sfdata-betablip.tgz contains:
  • betablip_beta.pdb - beta-lactamase molecular replacement model, sequence as in 1jtg and 1jtd
  • betablip_blip.pdb - Beta-Lactamase Inhibitory Protein model, BLIP as in 1jtg
  • beta_blip.mtz with columns Fobs Sigma
  • 1JTG, 1JTD r1jtgsf, r1jtdsf
    Elastase (Porcine Pancreatic)
    • Panjikar and Tucker
    • Br- SAD
    • data set Br-G1-Xe-W0.80: PPE crystals dipped in a 1:1 mixture of 1 M NaBr + 100% glycerol; subsequently pressurized under Xe; data collected at 0.8020Å
    • reflection file at PDB contains _refln.F_meas_au and _refln.F_meas_sigma_au
    • to be included later
    • Other PP elastases:
      • From the same paper: Br-oil-Xe at wavelength 0.80: 1l0z, r1l0zsf - _refln.F_meas_au and _refln.F_meas_sigma_au
      • I SIRAS - G Sheldrick: see sfdata-elastase.tgz
      • I SIRAS - G Evans and G Bricogne, Acta D58 (2002) 976. Solved with autoSHARP. 1gwa, r1gwasf - _refln.F_meas_au and _refln.F_meas_sigma_au.
        From same paper: 1c1m (Prange et al (1998) Proteins SFG 30, 61-73), which is native and Xenon/Krypton. Data: r1c1msf - _refln.F_squared_meas, _refln.F_squared_sigma, _refln.F_calc and _refln.phase_calc
      • CaCl2 and Na-citrate: 1lka, r1lkasf - _refln.F_meas_au, _refln.F_meas_sigma_au, _refln.ndb_anomalous_diff and _refln.ndb_anomalous_diff_sigma
        Na2SO4: 1lkb, r1lkbsf - _refln.F_meas_au, _refln.F_meas_sigma_au, _refln.ndb_anomalous_diff and _refln.ndb_anomalous_diff_sigma
        from Weiss et al., Acta D58 (2002) 1407. Based on 1qnj (atomic reso, Würtele et al., Acta D56 (2002) 520. Data: r1qnjsf - _refln.intensity_meas and _refln.intensity_sigma)
      • I SAD - G Evans, M Polentarutti, K Djinovic Carugo, G Bricogne, Acta D59 (2003) 1429-1434, also using radiation damage for phasing. No refined model, so not at PDB.
      • Xe MAD - C Mueller-Dieckmann et al., Acta D60 (2004) 28; collected at wavelengths 0.80-2.65Å, to maximum 1.65Å resolution, both in air and helium atmospheres. 1uo6, r1uo6sf - _refln.F_meas_au, _refln.F_meas_sigma_au, _refln.F_calc, _refln.phase_calc
    P212121 1.20 sfdata-elastase.tgz
    tar -xvzf ./sfdata-elastase.tgz
    to unpack into directory sfdata/

    sfdata-elastase.tgz contains:
  • elastase.sca - native data I/SIGI for centric and I(+)/SIGI(+)/I(-)/SIGI(-) for acentric reflections - 12 sulfurs can be found by SAD phasing from these data alone
  • elas-nai.sca - sodium iodide soak, I/SIGI for centric and I(+)/SIGI(+)/I(-)/SIGI(-) for acentric reflections - 18 to 20 iodides can by found by SIRAS phasing using both native and iodide soak datasets; occupancy refinement is a useful tool
  • 1l1g r1l1gsf

    Thanks a lot to everybody involved for releasing these data to the community, specifically the Autostruct partners; Zbyszek Dauter and friends for JIA and 'the Jolly SAD data'; Michael James, Natalie Strynadka and Randy Read for beta:BLIP.

    Test data suggested:

    molecule name details, (potential) use remarks
    HEW lysozyme sulphurs, high resolution, good symmetry, limited number of data JollySAD has one lysozyme: 1lz8, r1lz8sf, sfdata-lyso.tgz (P43212, collected at 1.54Å wavelength to 1.53Å resolution - anomalous signal of sulfurs and chlorines)
    a good DNA set would be really useful, but it would have to have standard PDB notation: So check with REFMAC libraries and make a really good example for everyone to use. JollySAD has one DNA: 1ICK, r1icksf, sfdata-dorota.tgz (P212121, collected at 0.98Å wavelength to 0.95Å resolution - phased on phosphorus)
    a protein-RNA complex  
    a good sugar example  
    different formats (mmCIF, SHELX, CSD, etc, etc) to test conversion programs, though the problem here is keeping up-to-date with changing formats
      we don't have one specific area of interest, but rather we need a range of examples covering different spacegroups (esp. not-so-well-tested ones like H3), different levels of NCS, different resolutions, etc.
    small protein solvable by direct methods Bence-Jones protein Rhe: 2rhe, no sf data at PDB

    References:


    Maria Turkenburg, mgwt@ysbl.york.ac.uk