Now it's your turn to do some model validation! Below are a number of
exercises (some requiring knowledge of protein crystallography) from
which you can choose a couple. My recommendation would be to do exercises
1, 3 (or 21), 8 (if you know how to use RasMol), 9 (or 10), 11, 14 (or 13 or 15),
19 (or 18 or 17), and 23 - but if your tutor tells you otherwise, listen to
him/her :-)
Attention Copenhagen students (January, 2009)!
Do the following exercises: 1, 3, 14 and 23. Write down your answers
to all the questions in these exercises. If you feel ambitious, you
can also try your hand at exercises 17, 18 and 19. If you know how
to use RasMol (and assuming it is installed on your machine), you
could also have a go at exercise 8.
|
Go to the RCSB page for entry
2GN5. Use the information there, and any
information you can find at PDBsum and PDBreport to assess the quality
of this model.
Q. 29. What is your opinion about the
quality of this model?
|
Assess the quality of PDB entry 1SRX (note the deposition date!).
Since only CA-coordinates are available, you will need to fetch the
coordinates and use the
STAN server to generate a "CA-Ramachandran
plot" of this model.
Q. 30. What is your opinion about the
quality of this model?
|
(Exercise borrowed from Roman Laskowski.)
In this exercise you will be given a list of protein models in the
PDBsum database. One of these contains severe errors! Your task (as
Detective of the Validation Police) is to spot which one it is ... As
Roman describes it:
The errors in the structure are deliberate; the structure was obtained
from genuine X-ray data but with the protein backbone wilfully and
irresponsibly traced backwards through the electron density. The
resultant backwards-traced model was refined using standard refinement
techniques until a low crystallographic R-factor was obtained.
The person responsible for this abomination is Gerard Kleywegt who has
said much and written much about protein structure refinement and how
it should be done properly. The point of the exercise was to
demonstrate how, even for an obviously and woefully wrong structure, it
is possible to achieve a low R-factor (often taken as a measure of a
structure's "accuracy" or "quality").
To obtain the line-up of candidate proteins, go to the PDBsum database.
Enter the name "Kleywegt" into the string search box near the bottom of
that page and press return. You will get a list of more than a dozen
structures, but the entry we are after has a PDB ID code that begins
with "1c"! Click on each entry in turn and see if you can find the one
that's wrong from the information on its PDBsum page. You'll know it
when you see it ...
Q. 31. What is the PDB code of the structure
you think is the bad one? Give at least 4 reasons why you selected it!
|
If you think that models as poor as the backwards-traced model used
above don't occur in the PDB, you're in for a surprise. Roman Laskowski
found this gem: 1RIP - may it Rest In Peace! Assess the quality of
this model using the familiar tools (but be gentle!). Since this is an
NMR model (as opposed to a crystal structure), this also serves to
demonstrate what kind of validation tools can be used for models that
are not based on crystallographic data.
Q. 32. What is your impression of the
quality of 1RIP?
|
PDB entries 2HHB, 3HHB and 4HHB are all crystal structures of human
hemoglobin. In fact, all three structures were derived from the same
crystallographic data. Nevertheless, the quality of the three entries
differs rather dramatically. Compare and contrast the three entries and
discuss their quality.
Q. 33. How would you rank the three
models?
|
The three entries 2HHB, 3HHB, and 4HHB replaced an earlier PDB entry,
1HHB. If you go to the PDB and look up 2HHB, for instance, there will
be a link to 1HHB in an archive of "obsolete" structures. Retrieve the
complete PDB file for entry 1HHB and save it in a file. Now go to the
Biotech Validation Server (and follow their "Quick start" link), upload
the PDB file of 1HHB and run PROCHECK and some (or all) of the WHAT IF
checks. Compare and contrast the quality of 1HHB and 4HHB.
Q. 34. If you had to use either one of these
models as the starting point for a homology modelling exercise, which
one would you prefer?
|
(Exercise borrowed from Tom Taylor.)
In this exercise, we shall have a protein model "beauty pageant" (or
perhaps "(political) correctness" is a better word than "beauty" ...):
we will rank three models by their quality (or reliability). The
structures were all solved in spacegroup P21 and refined to
a resolution of 2.3 Å. They are:
- 1CVH, human carbonic anhydrase II, mutant H96C
- 1EMC, mutant of green fluorescent protein from Aequorea victoria
- 2RUS, Rhodospirillum rubrum rubisco (ribulose-1,5-bisphosphate
carboxylase/oxygenase)
Check the Ramachandran plots of these models, and look at the R-values,
packing quality, etc.
Q. 35. How would you rank these three models?
|
In this exercise, we shall compare two different models of the same
enzyme that are based on the same crystallographic data. The
enzyme is called chloromuconate cycloisomerase, and its structure was
originally solved in spacegroup I4 with two-fold non-crystallographic
symmetry (i.e., there were two molecules present in the
asymmetric unit of the crystal, whose structures were determined and
refined independently). The model and the structure factor data were
deposited in the PDB with code 1CHR.
Later, it was found (by other crystallographers) that the spacegroup
was not I4, but rather I422 (and, hence, that there wasn't any
non-crystallographic symmetry!). This means that the two molecules
whose structures were determined and refined independently in the
original study, are in actual fact completely identical! The
structure was re-refined and deposited in the PDB with code 2CHR.
Check the Ramachandran plots of both models, and look at the R-values,
packing quality, etc.
Q. 36. How does the quality of the two
models compare?
|
Now, let's have a more detailed look at the two models (this part only
works if your browser is set up properly for handling RasMol scripts).
Go to the PDBsum page for 2CHR, and scroll until you see a link to "SAS
- annotated FASTA alignment of related sequences in the PDB". Click on
the coloured SAS logo and be patient until the next page loads. SAS
will now do a FASTA alignment (i.e., a global sequence
alignment) of the sequence of 2CHR and all sequences in the PDB. The
most similar PDB entries and their sequences will be listed.
When the SAS page has loaded, click on the link "View 3D structures"
(next to the RasMol logo). A new window will open with a form for you
to complete. At the top, click on the checkboxes for the top two
entries ("2CHR" and "1CHR (A)", respectively). Next, scroll to the
bottom of the form and change the "Colour structures by" option to "a
different colour for each structure". If you now click on the "DISPLAY"
button (and wait a few seconds), RasMol should automatically start up,
load the two molecules (superimposed by sequence) and draw them. 2CHR
will be shown as a purple CA trace, and chain A of 1CHR as a yellow CA
trace.
The RasMol terminal window will show the RMSD between the two
molecules.
Q. 37. What is their RMSD (on CA atoms)?
|
If your RasMol knowledge is rusty, the mouse controls are as follows:
Operation | Action
|
---|
Left mouse button | Rotate x-y
|
Right mouse button | Translate x-y
|
Shift-key + left mouse button | Zoom in/out
|
Shift-key + right mouse button | Rotate z
|
Control-key + left mouse button | Slab plane
|
At first glance, the two models are very similar (which seems strange
given the rather high RMSD on CA atoms!). Now type the following into
your RasMol terminal window: "restrict 14-42". Have a closer look at
residues 15-17 of 2CHR and 1CHR. (Note: you may want to switch on the
"Labels" option from RasMol's Options menu.)
Q. 38. Which residue in 1CHR corresponds
(structurally) to residue 17 in 2CHR? Can you explain this?
|
It gets worse ...
Q. 39. Which residue in 1CHR corresponds
(structurally) to residue 22 in 2CHR? Which residue in 1CHR
corresponds (structurally) to residue 28 in 2CHR? What happens
in the loop prior to residue 41 in 1CHR compared to 2CHR?
|
To illustrate just one effect of the errors in 1CHR, go to the PDBsum
page for 1CHR and click on the button marked "RasMol" (at the top of
the page). This should launch RasMol and show you all the atoms found
in the PDB file. Type the following RasMol commands to display only
atoms near the CZ atom of arginine 35 in chain A:
- "centre arg35a.cz" (or "centre 35a" if that doesn't work),
followed by:
- "restrict within(8.0,arg35a.cz)" (or "restrict within(8.0,35a)"
if that doesn't work).
Again, you may want to switch on "Labels" in the Options menu.
Q. 40. Do you think this arginine is
"happy" in the environment that it's in here? What is the
problem with the strand that this arginine is located on?
|
Q. 41. Can you now explain why the RMSD
between 1CHR and 2CHR is so high, even though the structures are
fairly similar on the whole?
|
For entry 1PTE only CA-atoms have been deposited. Years later, the
entry was superseded by 3PTE. Compare 1PTE and 3PTE (in terms of
resolution, CA-Ramachandran plot, etc.).
Q. 42. What is going on here? And what's
the story about 2PTE?
|
If you can't get enough, compare 1PHY and 2PHY.
Q. 43. What is going on here?
|
First read the following comment: B Rupp & B Segelke (2001).
Questions about the structure of the botulinum neurotoxin B light chain
in complex with a target peptide. Nature Structural Biology
8 (8), 663-664. Also read the reply by Stevens and Hanson on
page 664. You may also want to read the original paper by Hanson and
Stevens
(PubMed).
Rupp and Segelke essentially contest that the alleged synaptobrevin-II
peptide in the complex with BoTox is largely a figment of the
imagination. They base this on an analysis of the geometry, temperature
factors, and electron-density maps of the corresponding PDB entry
1F83. In this entry, chain "A" is the BoTox model,
and chains "B" and "C" represent the cleaved synaptobrevin-II peptide.
Attention "O" users!
Download "All files (.tar.gz)" for this entry to a directory that
you own! Go to that directory, unpack the downloaded file
(tar xovpfz 1f83.tar.gz), go to the new subdirectory
(cd 1f83), and start up O.
(Note: if you have problems downloading the .tar.gz file, try
this link instead.)
|
Q. 44. Assume that you had been asked by
Nature Structural Biology to referee the letter by Rupp
and Segelke. Using all the tools available (PDB, PDBsum, PDBreport,
EDS - all of these were available at the time), assess whether or
not there is plausible support for their allegations. Based on your
findings, decide whether or not their letter is of interest and
should be published. Write a motivated recommendation (i.e., a
referee report) to the editor of Nature Structural Biology.
|
In 1993, the 1.74 Å structure of a complex of a mutant of
intestinal fatty-acid binding protein (IFABP) with oleic acid was
reported
(PubMed). The density for the carboxylate group was
ambiguous and the model as deposited in the PDB
(1ICN) contains three alternate conformations for
this moiety.
In a later study, this structure was used by Klebe and co-workers
(PubMed) to validate their docking program and
scoring function. The docking calculations indicated that the
"observed" binding mode of the oleic acid was not particularly
favourable. Instead, their method suggests that a different orientation
of the entire ligand (in essence, swapping the head and the tail) is
much more favourable.
Attention "O" users!
Download "All files (.tar.gz)" for this entry to a directory that
you own! Go to that directory, unpack the downloaded file
(tar xovpfz 1icn.tar.gz), go to the new subdirectory
(cd 1icn), and start up O.
(Note: if you have problems downloading the .tar.gz file, try
this link instead.)
|
Q. 45. Inspect the density for the oleic
acid ligand in the structure of 1ICN. Is the model with three
alternative conformations of the carboxylate group credible
in terms of (a) density, and (b) stabilising interactions?
Is there support in the density for the alternative orientation,
with the oleate's head and tail reversed, and with hydrogen bonds
between the carboxylate oxygen atoms and an amide group in the
protein? What is your conclusion?
|
Macromolecular crystallographers are not necessarily good chemists.
Since the refined geometry of small-molecule compounds that are bound
to macromolecules is the product of both the experimental data
and the expectations of the crystallographer (expressed as
geometric restraints), such structures sometimes have odd features.
Q. 46. What is funny about the structure
of 3-phenylpropylamine as found in PDB
entry 1TNK? And which two features would a chemist complain about
in the structure of SB-202190 in PDB
entry 1PME? Hint: you may want to use the
HIC-Up server to measure (improper) torsion
angles etc. (you will find them in the HETZE log file when you
have run your compound through the server).
|
... or the case of the inhibitor that went AWOL. In 2000, a complex
of botulinum neurotoxin B protease and its inhibitor BABIM was
published. Two years later, the authors withdrew the structure
stating: "After a detailed analysis of
the electron density maps for the structure of the inhibitor complex,
we have concluded that the maps do not support the placement of the
inhibitor as stated in the paper. Therefore, we are withdrawing the
structural conclusions derived from PDB file 1FQH presented in the
paper".
Attention "O" users!
Download "All files (.tar.gz)" for this entry to a directory that
you own! Go to that directory, unpack the downloaded file
(tar xovpfz 1fqh.tar.gz), go to the new subdirectory
(cd 1fqh), and start up O.
(Note: if you have problems downloading the .tar.gz file, try
this link instead.)
|
Q. 47. Inspect the density for the BABIM
ligand in the structure of 1FQH. What is your conclusion?
|
... but some hetero-compounds are more equal than others!
PDB entries 1KEL (solved at 1.9 Å) and 1FL6 (solved at 2.8
Å) both contain a ligand with an excruciatingly long name that we
shall refer to as simply AAH.
Attention "O" users!
If you have problems downloading the .tar.gz files, try the
following links instead:
1KEL, 1FL6!
|
Q. 48. For both these structures, assess
how much you trust the (a) presence, (b) orientation, (c) conformation,
and (d) coordinate precision of the AAH ligand.
|
PDB entries 268D and 1D63 have both been determined at ~2 Å
resolution, and both contain a ligand called
berenil.
Attention "O" users!
If you have problems downloading the .tar.gz files, try the
following links instead:
268D, 1D63!
|
Q. 49. For both these structures, assess
how much you trust the (a) presence, (b) orientation, (c) conformation,
and (d) coordinate precision of the berenil ligand.
|
A small number of proteins have been synthesised in their all-D form
(i.e., consisting of only D-amino acids). The structure of a
racemic mixture of D-monellin and L-monellin has been published
(PubMed) and deposited in the PDB with identifier
1KRL.
The paper describes how "the crystal structure consists of two D and
two L-monellin molecules in the P1 unit cell with a
pseudo-centrosymmetrical arrangement". It also argues that "small but
significant structural differences between D and L-monellin in the same
crystal" exist.
Q. 50. Inspect the PDB entry and its
PDBREPORT page etc. and try to verify the two cited claims from
the paper's abstract.
|
Quite a few structures contain one or a few D-amino acids. These may be
either genuine D-amino acids or artefacts due to model-building or
refinement errors.
Attention "O" users!
If you have problems downloading the .tar.gz files, try the
following links instead:
1A7S, 1AN1!
|
Q. 51. PDB entry 1A7S is a 1.1 Å
structure, in which valine 50 is a D-amino acid. Is this a
genuine D-amino acid or an artefact? And how about residue E115
in the 2 Å structure with PDB code 1AN1?
|
Download PDB entry 2CDS (structure of hen egg-white lysozyme as
determined by someone you know) and submit it to the STAN (STructure
ANalysis) server (see the link on the page with useful links).
Q. 52. Check the results of the program
WASP on this entry. What do they tell you? Check this suggestion
by inspecting the model and the density (obtained from EDS).
Do you agree?
|
Download PDB entry 1FIB and submit it to the STAN (STructure
ANalysis) server (see the link on the page with useful links).
Q. 53. Check the results of the program
CISPEP on this entry. What do they tell you? Check this suggestion
by inspecting the model and the density (obtained from EDS).
Do you agree?
|
Q. 54. Look up the chemical formula for
N-acetyl-D-glucosamine (called NAG in PDB files). Now look at
the NAG-NAG ligand of PDB entry 1B3J (e.g., at PDBsum). Do you
notice anything strange?
|
Q. 55. PDB entries 3XIA and 1XYA are two crystal
structures of the same enzyme. How do they differ? Which of these two
would you use as a molecular replacement search model?
|
Make a copy of the PDB file of entry 1CBS and remove the ligand and the
water molecules from it with a text editor. Then submit this file to
the following servers and let them calculate the solvent-accessible
surface area (ASA) for the entire protein (using default parameters):
- StrucTools (use the MSMS option)
- POPS
- DSSP
- RPBS ASA
- VADAR
Q. 56. What values do you obtain for the
total ASA? How would you report the ASA in your paper?
|
Read this short paper (2 pages) if you have
access to it.
Q. 57. Describe in your own words what
the authors are trying to say. Confirm your suspicions by
inspecting the electron density in the binding site of
PDB entry 2GWX and by comparing it to that in 2BAW.
|
Practical "Model Validation" -
EMBO Bioinformatics Course -
Uppsala 2001 - © 2001-2009
Gerard Kleywegt
(Check links)
Latest update on 26 January, 2009.