Practical "Model Validation" - EMBO Bioinformatics Course

| Useful links | Glossary of terms | Google |

Now it's your turn to do some model validation! Below are a number of exercises (some requiring knowledge of protein crystallography) from which you can choose a couple. My recommendation would be to do exercises 1, 3 (or 21), 8 (if you know how to use RasMol), 9 (or 10), 11, 14 (or 13 or 15), 19 (or 18 or 17), and 23 - but if your tutor tells you otherwise, listen to him/her :-)

Attention Copenhagen students (January, 2009)!

Do the following exercises: 1, 3, 14 and 23. Write down your answers to all the questions in these exercises. If you feel ambitious, you can also try your hand at exercises 17, 18 and 19. If you know how to use RasMol (and assuming it is installed on your machine), you could also have a go at exercise 8.

[1] DNA-binding or spell-binding?

Go to the RCSB page for entry 2GN5. Use the information there, and any information you can find at PDBsum and PDBreport to assess the quality of this model.

Q. 29. What is your opinion about the quality of this model?

[2] Old gold

Assess the quality of PDB entry 1SRX (note the deposition date!). Since only CA-coordinates are available, you will need to fetch the coordinates and use the STAN server to generate a "CA-Ramachandran plot" of this model.

Q. 30. What is your opinion about the quality of this model?

[3] Spitting image?

(Exercise borrowed from Roman Laskowski.)

In this exercise you will be given a list of protein models in the PDBsum database. One of these contains severe errors! Your task (as Detective of the Validation Police) is to spot which one it is ... As Roman describes it:

The errors in the structure are deliberate; the structure was obtained from genuine X-ray data but with the protein backbone wilfully and irresponsibly traced backwards through the electron density. The resultant backwards-traced model was refined using standard refinement techniques until a low crystallographic R-factor was obtained.
The person responsible for this abomination is Gerard Kleywegt who has said much and written much about protein structure refinement and how it should be done properly. The point of the exercise was to demonstrate how, even for an obviously and woefully wrong structure, it is possible to achieve a low R-factor (often taken as a measure of a structure's "accuracy" or "quality").
To obtain the line-up of candidate proteins, go to the PDBsum database. Enter the name "Kleywegt" into the string search box near the bottom of that page and press return. You will get a list of more than a dozen structures, but the entry we are after has a PDB ID code that begins with "1c"! Click on each entry in turn and see if you can find the one that's wrong from the information on its PDBsum page. You'll know it when you see it ...

Q. 31. What is the PDB code of the structure you think is the bad one? Give at least 4 reasons why you selected it!

[4] Rest in Peace

If you think that models as poor as the backwards-traced model used above don't occur in the PDB, you're in for a surprise. Roman Laskowski found this gem: 1RIP - may it Rest In Peace! Assess the quality of this model using the familiar tools (but be gentle!). Since this is an NMR model (as opposed to a crystal structure), this also serves to demonstrate what kind of validation tools can be used for models that are not based on crystallographic data.

Q. 32. What is your impression of the quality of 1RIP?

[5] Bloody hemoglobin!

PDB entries 2HHB, 3HHB and 4HHB are all crystal structures of human hemoglobin. In fact, all three structures were derived from the same crystallographic data. Nevertheless, the quality of the three entries differs rather dramatically. Compare and contrast the three entries and discuss their quality.

Q. 33. How would you rank the three models?

[6] Hemoglobin is thicker than water

The three entries 2HHB, 3HHB, and 4HHB replaced an earlier PDB entry, 1HHB. If you go to the PDB and look up 2HHB, for instance, there will be a link to 1HHB in an archive of "obsolete" structures. Retrieve the complete PDB file for entry 1HHB and save it in a file. Now go to the Biotech Validation Server (and follow their "Quick start" link), upload the PDB file of 1HHB and run PROCHECK and some (or all) of the WHAT IF checks. Compare and contrast the quality of 1HHB and 4HHB.

Q. 34. If you had to use either one of these models as the starting point for a homology modelling exercise, which one would you prefer?

[7] Beauty pageant?

(Exercise borrowed from Tom Taylor.)

In this exercise, we shall have a protein model "beauty pageant" (or perhaps "(political) correctness" is a better word than "beauty" ...): we will rank three models by their quality (or reliability). The structures were all solved in spacegroup P2₁ and refined to a resolution of 2.3 Å. They are:

1CVH, human carbonic anhydrase II, mutant H96C
1EMC, mutant of green fluorescent protein from Aequorea victoria
2RUS, Rhodospirillum rubrum rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase)

Check the Ramachandran plots of these models, and look at the R-values, packing quality, etc.

Q. 35. How would you rank these three models?

[8] Siblings?

In this exercise, we shall compare two different models of the same enzyme that are based on the same crystallographic data. The enzyme is called chloromuconate cycloisomerase, and its structure was originally solved in spacegroup I4 with two-fold non-crystallographic symmetry (i.e., there were two molecules present in the asymmetric unit of the crystal, whose structures were determined and refined independently). The model and the structure factor data were deposited in the PDB with code 1CHR.

Later, it was found (by other crystallographers) that the spacegroup was not I4, but rather I422 (and, hence, that there wasn't any non-crystallographic symmetry!). This means that the two molecules whose structures were determined and refined independently in the original study, are in actual fact completely identical! The structure was re-refined and deposited in the PDB with code 2CHR.

Check the Ramachandran plots of both models, and look at the R-values, packing quality, etc.

Q. 36. How does the quality of the two models compare?

Now, let's have a more detailed look at the two models (this part only works if your browser is set up properly for handling RasMol scripts).

Go to the PDBsum page for 2CHR, and scroll until you see a link to "SAS - annotated FASTA alignment of related sequences in the PDB". Click on the coloured SAS logo and be patient until the next page loads. SAS will now do a FASTA alignment (i.e., a global sequence alignment) of the sequence of 2CHR and all sequences in the PDB. The most similar PDB entries and their sequences will be listed.

When the SAS page has loaded, click on the link "View 3D structures" (next to the RasMol logo). A new window will open with a form for you to complete. At the top, click on the checkboxes for the top two entries ("2CHR" and "1CHR (A)", respectively). Next, scroll to the bottom of the form and change the "Colour structures by" option to "a different colour for each structure". If you now click on the "DISPLAY" button (and wait a few seconds), RasMol should automatically start up, load the two molecules (superimposed by sequence) and draw them. 2CHR will be shown as a purple CA trace, and chain A of 1CHR as a yellow CA trace.

The RasMol terminal window will show the RMSD between the two molecules.

Q. 37. What is their RMSD (on CA atoms)?

If your RasMol knowledge is rusty, the mouse controls are as follows:

Operation	Action
Left mouse button	Rotate x-y
Right mouse button	Translate x-y
Shift-key + left mouse button	Zoom in/out
Shift-key + right mouse button	Rotate z
Control-key + left mouse button	Slab plane

At first glance, the two models are very similar (which seems strange given the rather high RMSD on CA atoms!). Now type the following into your RasMol terminal window: "restrict 14-42". Have a closer look at residues 15-17 of 2CHR and 1CHR. (Note: you may want to switch on the "Labels" option from RasMol's Options menu.)

Q. 38. Which residue in 1CHR corresponds (structurally) to residue 17 in 2CHR? Can you explain this?

It gets worse ...

Q. 39. Which residue in 1CHR corresponds (structurally) to residue 22 in 2CHR? Which residue in 1CHR corresponds (structurally) to residue 28 in 2CHR? What happens in the loop prior to residue 41 in 1CHR compared to 2CHR?

To illustrate just one effect of the errors in 1CHR, go to the PDBsum page for 1CHR and click on the button marked "RasMol" (at the top of the page). This should launch RasMol and show you all the atoms found in the PDB file. Type the following RasMol commands to display only atoms near the CZ atom of arginine 35 in chain A:

"centre arg35a.cz" (or "centre 35a" if that doesn't work), followed by:
"restrict within(8.0,arg35a.cz)" (or "restrict within(8.0,35a)" if that doesn't work).

Again, you may want to switch on "Labels" in the Options menu.

Q. 40. Do you think this arginine is "happy" in the environment that it's in here? What is the problem with the strand that this arginine is located on?

Q. 41. Can you now explain why the RMSD between 1CHR and 2CHR is so high, even though the structures are fairly similar on the whole?

[9] Refolding?

For entry 1PTE only CA-atoms have been deposited. Years later, the entry was superseded by 3PTE. Compare 1PTE and 3PTE (in terms of resolution, CA-Ramachandran plot, etc.).

Q. 42. What is going on here? And what's the story about 2PTE?

[10] Metamorphosis ...

If you can't get enough, compare 1PHY and 2PHY.

Q. 43. What is going on here?

[11] BoTox or NoTox?

First read the following comment: B Rupp & B Segelke (2001). Questions about the structure of the botulinum neurotoxin B light chain in complex with a target peptide. Nature Structural Biology 8 (8), 663-664. Also read the reply by Stevens and Hanson on page 664. You may also want to read the original paper by Hanson and Stevens (PubMed).

Rupp and Segelke essentially contest that the alleged synaptobrevin-II peptide in the complex with BoTox is largely a figment of the imagination. They base this on an analysis of the geometry, temperature factors, and electron-density maps of the corresponding PDB entry 1F83. In this entry, chain "A" is the BoTox model, and chains "B" and "C" represent the cleaved synaptobrevin-II peptide.

Attention "O" users!

Download "All files (.tar.gz)" for this entry to a directory that you own! Go to that directory, unpack the downloaded file (tar xovpfz 1f83.tar.gz), go to the new subdirectory (cd 1f83), and start up O. (Note: if you have problems downloading the .tar.gz file, try this link instead.)

Q. 44. Assume that you had been asked by Nature Structural Biology to referee the letter by Rupp and Segelke. Using all the tools available (PDB, PDBsum, PDBreport, EDS - all of these were available at the time), assess whether or not there is plausible support for their allegations. Based on your findings, decide whether or not their letter is of interest and should be published. Write a motivated recommendation (i.e., a referee report) to the editor of Nature Structural Biology.

[12] Head (n)or tail?

In 1993, the 1.74 Å structure of a complex of a mutant of intestinal fatty-acid binding protein (IFABP) with oleic acid was reported (PubMed). The density for the carboxylate group was ambiguous and the model as deposited in the PDB (1ICN) contains three alternate conformations for this moiety.

In a later study, this structure was used by Klebe and co-workers (PubMed) to validate their docking program and scoring function. The docking calculations indicated that the "observed" binding mode of the oleic acid was not particularly favourable. Instead, their method suggests that a different orientation of the entire ligand (in essence, swapping the head and the tail) is much more favourable.

Attention "O" users!

Download "All files (.tar.gz)" for this entry to a directory that you own! Go to that directory, unpack the downloaded file (tar xovpfz 1icn.tar.gz), go to the new subdirectory (cd 1icn), and start up O. (Note: if you have problems downloading the .tar.gz file, try this link instead.)

Q. 45. Inspect the density for the oleic acid ligand in the structure of 1ICN. Is the model with three alternative conformations of the carboxylate group credible in terms of (a) density, and (b) stabilising interactions? Is there support in the density for the alternative orientation, with the oleate's head and tail reversed, and with hydrogen bonds between the carboxylate oxygen atoms and an amide group in the protein? What is your conclusion?

[13] The chemistry is wrong?

Macromolecular crystallographers are not necessarily good chemists. Since the refined geometry of small-molecule compounds that are bound to macromolecules is the product of both the experimental data and the expectations of the crystallographer (expressed as geometric restraints), such structures sometimes have odd features.

Q. 46. What is funny about the structure of 3-phenylpropylamine as found in PDB entry 1TNK? And which two features would a chemist complain about in the structure of SB-202190 in PDB entry 1PME? Hint: you may want to use the HIC-Up server to measure (improper) torsion angles etc. (you will find them in the HETZE log file when you have run your compound through the server).

[14] To be or not to be ...

... or the case of the inhibitor that went AWOL. In 2000, a complex of botulinum neurotoxin B protease and its inhibitor BABIM was published. Two years later, the authors withdrew the structure stating: "After a detailed analysis of the electron density maps for the structure of the inhibitor complex, we have concluded that the maps do not support the placement of the inhibitor as stated in the paper. Therefore, we are withdrawing the structural conclusions derived from PDB file 1FQH presented in the paper".

Attention "O" users!

Download "All files (.tar.gz)" for this entry to a directory that you own! Go to that directory, unpack the downloaded file (tar xovpfz 1fqh.tar.gz), go to the new subdirectory (cd 1fqh), and start up O. (Note: if you have problems downloading the .tar.gz file, try this link instead.)

Q. 47. Inspect the density for the BABIM ligand in the structure of 1FQH. What is your conclusion?

[15] All hetero-compounds are equal ...

... but some hetero-compounds are more equal than others!

PDB entries 1KEL (solved at 1.9 Å) and 1FL6 (solved at 2.8 Å) both contain a ligand with an excruciatingly long name that we shall refer to as simply AAH.

Attention "O" users!

If you have problems downloading the .tar.gz files, try the following links instead: 1KEL, 1FL6!

Q. 48. For both these structures, assess how much you trust the (a) presence, (b) orientation, (c) conformation, and (d) coordinate precision of the AAH ligand.

PDB entries 268D and 1D63 have both been determined at ~2 Å resolution, and both contain a ligand called berenil.

Attention "O" users!

If you have problems downloading the .tar.gz files, try the following links instead: 268D, 1D63!

Q. 49. For both these structures, assess how much you trust the (a) presence, (b) orientation, (c) conformation, and (d) coordinate precision of the berenil ligand.

[16] Mirror, mirror ...

A small number of proteins have been synthesised in their all-D form (i.e., consisting of only D-amino acids). The structure of a racemic mixture of D-monellin and L-monellin has been published (PubMed) and deposited in the PDB with identifier 1KRL. The paper describes how "the crystal structure consists of two D and two L-monellin molecules in the P1 unit cell with a pseudo-centrosymmetrical arrangement". It also argues that "small but significant structural differences between D and L-monellin in the same crystal" exist.

Q. 50. Inspect the PDB entry and its PDBREPORT page etc. and try to verify the two cited claims from the paper's abstract.

[17] Left or right?

Quite a few structures contain one or a few D-amino acids. These may be either genuine D-amino acids or artefacts due to model-building or refinement errors.

Attention "O" users!

If you have problems downloading the .tar.gz files, try the following links instead: 1A7S, 1AN1!

Q. 51. PDB entry 1A7S is a 1.1 Å structure, in which valine 50 is a D-amino acid. Is this a genuine D-amino acid or an artefact? And how about residue E115 in the 2 Å structure with PDB code 1AN1?

[18] Really water?

Download PDB entry 2CDS (structure of hen egg-white lysozyme as determined by someone you know) and submit it to the STAN (STructure ANalysis) server (see the link on the page with useful links).

Q. 52. Check the results of the program WASP on this entry. What do they tell you? Check this suggestion by inspecting the model and the density (obtained from EDS). Do you agree?

[19] Cis or trans?

Download PDB entry 1FIB and submit it to the STAN (STructure ANalysis) server (see the link on the page with useful links).

Q. 53. Check the results of the program CISPEP on this entry. What do they tell you? Check this suggestion by inspecting the model and the density (obtained from EDS). Do you agree?

[20] Nag, nag, nag

Q. 54. Look up the chemical formula for N-acetyl-D-glucosamine (called NAG in PDB files). Now look at the NAG-NAG ligand of PDB entry 1B3J (e.g., at PDBsum). Do you notice anything strange?

[21] Out with the old, in with the new

Q. 55. PDB entries 3XIA and 1XYA are two crystal structures of the same enzyme. How do they differ? Which of these two would you use as a molecular replacement search model?

[22] Superficial

Make a copy of the PDB file of entry 1CBS and remove the ligand and the water molecules from it with a text editor. Then submit this file to the following servers and let them calculate the solvent-accessible surface area (ASA) for the entire protein (using default parameters):

StrucTools (use the MSMS option)
POPS
DSSP
RPBS ASA
VADAR

Q. 56. What values do you obtain for the total ASA? How would you report the ASA in your paper?

[23] Ligand? What ligand?

Read this short paper (2 pages) if you have access to it.

Q. 57. Describe in your own words what the authors are trying to say. Confirm your suspicions by inspecting the electron density in the binding site of PDB entry 2GWX and by comparing it to that in 2BAW.

Latest update on 26 January, 2009.