Attention Copenhagen students (January, 2009)!
This page is optional - provided you attended Gerard's lecture.
|
Types of errors
Despite all the care that (hopefully) has been taken during model
building and refinement, errors may persist in a model all the way to
the publication and deposition of the model. In more than a few cases
(in particular, but not exclusively, in the second half of the 1980s)
completely wrong models have ended up being published in sometimes very
prestigious journals ...
Such errors come in various classes, and, fortunately, the frequency of
each type of error is inversely proportional to its seriousness (you
will encounter examples of most of these categories in the exercises at
the end of this practical):
- In the worst case, a model (or a sub-unit) may, essentially, be
completely wrong.
- In other cases, secondary structure elements may have been
correctly identified for the most part, but incorrectly connected.
- A fairly common mistake during the initial tracing is to overlook
one residue, which leads to a register error (or frame shift). The
model is usually brought back into register with the density a bit
further down the sequence, where the opposite error is made (e.g.,
an extra residue is inserted into density for a turn). This is a
serious error, but it is usually possible to detect and correct it
in the course of the refinement and rebuilding process.
- Sometimes the primary sequence used by the crystallographer
contains one or more mistakes. These may be due to cloning artefacts,
to post-translational modifications, to sequencing errors, to the
absence of a published amino-acid sequence at the time of tracing,
or simply to trivial human mistakes.
- The most common type of model-building error is locally incorrect
main-chain and/or side-chain conformations. Such errors are easy to make
in low-resolution maps calculated with imperfect phases. Moreover,
multiple conformations are often unresolved even at moderately high
resolution (~2 Å), which further complicates the interpretation of
side-chain density. Nevertheless, many of them can be avoided.
- Various types of error (possibly, to some extent, compensating
ones) can be introduced during refinement. This can be "achieved"
by manipulating the data-to-parameter ratio, i.e. the
ratio between the amount of data (both experimental data and
other sources of information, such as "ideal" bond lengths etc.)
and the number of parameters in the model. This can be done, for
instance, by removing data (in particular if it doesn't agree
well with the model ...) or by adding hundreds of extra atoms (in
particular water molecules). At a resolution of 1 Å, the ratio
of experimental observations alone to model parameters may be as
high as 10 or 20, whereas at 2 Å this will be only ~1-3,
and at 3 Å it will usually be below one! This category
should perhaps not be classified as "errors" but rather as
poor or inappropriate refinement practice.
Why do models with serious flaws nevertheless end up in the literature
and databases? Possible causes may be:
- inexperienced, under-supervised people do the work (and have a
supervisor who may be in a hurry to publish)
- computer programs are used as black boxes
- methodological improvements are not adopted until the limitations of
older ones have been experienced first-hand
- intermediate models are not subjected to critical and systematic quality
analysis
- "quality indicators" are used that are strongly correlated with
parameters that are restrained during refinement (RMS deviation of
bond lengths and angles from ideal values, etc.)
What is a good model?
In brief: a good model makes sense in all respects. In other words, it
passes just about every test you can think of to assess its validity
(and even those that nobody has thought of yet ...). This includes:
- chemical sense: normal bond lengths and angles, correct chirality
(no D-amino acids unless you know from other sources that they
are present), flat aromatic rings, flat sp2-hybridised
carbons, etc.
- physical sense: no interpenetrating non-bonded atoms, favourable
packing of the molecules in the crystal, sensible patterns of
temperature factors, occupancies of alternative conformations add
up to one, etc.
- crystallographic sense: the model adequately explains the experimental
data
- statistical sense: the model is the best hypothesis to explain the
data, with minimal over-fitting (few assumptions)
- protein-structural sense: the model "looks like a protein": reasonable
Ramachandran plot, not too many unusual side-chain conformations,
not too many buried charges, residues are "happy" in their
environment
- biological sense: the model explains previous observations (although sometimes
it may disprove earlier experiments ...), e.g. with respect to activity,
specificity, effect of mutations or inhibitors; also, the model
should enable you to make predictions (e.g.: "if we mutate this
aspartate to asparagine, it should kill the enzyme") that can be
tested experimentally ("falsifiable hypotheses")
Some concepts
Accuracy, precision and significance are three concepts that many
crystallographers use in a hand-waving sort of way. Please remember:
- Accuracy is related to how far from "the true value" a measurement
is (or how far from the true structure a model is)
- Precision has to do with the level of detail (the decimal places,
if you like) and reproducibility
Knowing this, you will realise that 4.7352516121262 is a very precise,
but hopelessly inaccurate approximation of the value of pi. On
the other hand, 3.14 is a considerably more accurate, but not very precise
approximation.
To most crystallographers, "significant" means: "I think it's
large". You will often read things like "the loop is shifted
significantly" or "the orientation of the aspartate sidechain in the
two complexes differs significantly" in papers describing protein
crystal structures. This rarely (if ever) implies that any statistical
significance test has been carried out.
Another trap to be aware of (and one that countless many
crystallographers have also fallen into) is that of deriving
"high-resolution information" from low-resolution models. For
instance, in a typical 3 Å structure, the uncertainty in the
position of the individual atoms can easily be 0.5 Å or more.
Nevertheless, many such models have been described where hydrogen
bonding distances are listed with a precision (note: not accuracy!) of
0.01 Å (probably because the program that was used to calculate
these distances used that particular precision), and solvent-accessible
surface areas with a precision of 1 Å2 ...
Validation criteria
If you look it up in a dictionary, "validation" is defined as:
- to declare or make legally valid
- to mark with an indication of official sanction
- to substantiate or verify
Many statistics, methods, and programs were developed in the 1990s to
help identify errors in protein models. These methods generally fall
into two classes: one in which only coordinates are considered (such
methods often entail comparison of a model to information derived from
structural databases), and another in which both the model and the
crystallographic data are taken into account. Alternatively, one can
distinguish between methods that essentially measure how well the
refinement program has succeeded in imposing restraints (e.g.,
deviations from ideal geometry, conventional R-value) and those that
assess aspects of the model that are "orthogonal" to the information
used in refinement (e.g., free R-value, patterns of non-bonded
interactions, conformational torsion-angle distributions). An
additional distinction can be made between methods that provide overall
(global) statistics for a model (such methods are suitable for
monitoring the progress of the refinement and rebuilding process) and
those that provide information at the level of residues or atoms (such
methods are more useful for detecting local problems in a model). It is
important to realise that almost all coordinate-based validation
methods detect outliers (i.e., atoms or residues with unusual
properties): to assess whether an outlier is an error in the
model or whether it is a genuine, but unusual, feature of the
structure, one must inspect the (preferably unbiased)
electron-density maps!
If you are interested in the overall quality of a model (e.g.,
to decide if it's good enough to use as a starting point for
comparative modelling), strong and global quality indicators are most
useful. Examples of such criteria are (don't worry - you'll learn what
they are soon):
- Free R-value
- Packing score
- Ramachandran plot
If, on the other hand, you are interested in finding the local errors
(to decide if the active site in a model is good enough to use it for
the design of ligands), strong and local methods are most suitable.
Examples of these are:
- Real-space fit
- Main-chain torsion-angle combinations
- Side-chain torsion-angle combinations
Unfortunately, in many papers that describe macromolecular crystal
structures, "quality criteria" are quoted that do not necessarily
provide any indication whatsoever of the actual quality of the model.
Examples are:
- Conventional R-value
- RMS deviation of bond lengths and angles from "ideal" values
- Average temperature factor of the atoms in the model
It is also important to realise that every quality check that a model
passes provides a necessary but insufficient indication
of the model's correctness. Remember that a good model makes sense in
just about every respect.
Practical "Model Validation" -
EMBO Bioinformatics Course -
Uppsala 2001 - © 2001-2009
Gerard Kleywegt
(Check links)
Latest update on 26 January, 2009.