Practical "Model Validation" - EMBO Bioinformatics Course

| Useful links | Glossary of terms | Google |

Attention Copenhagen students (January, 2009)!

This page is optional - provided you attended Gerard's lecture.

Types of errors

Despite all the care that (hopefully) has been taken during model building and refinement, errors may persist in a model all the way to the publication and deposition of the model. In more than a few cases (in particular, but not exclusively, in the second half of the 1980s) completely wrong models have ended up being published in sometimes very prestigious journals ...

Such errors come in various classes, and, fortunately, the frequency of each type of error is inversely proportional to its seriousness (you will encounter examples of most of these categories in the exercises at the end of this practical):

In the worst case, a model (or a sub-unit) may, essentially, be completely wrong.
In other cases, secondary structure elements may have been correctly identified for the most part, but incorrectly connected.
A fairly common mistake during the initial tracing is to overlook one residue, which leads to a register error (or frame shift). The model is usually brought back into register with the density a bit further down the sequence, where the opposite error is made (e.g., an extra residue is inserted into density for a turn). This is a serious error, but it is usually possible to detect and correct it in the course of the refinement and rebuilding process.
Sometimes the primary sequence used by the crystallographer contains one or more mistakes. These may be due to cloning artefacts, to post-translational modifications, to sequencing errors, to the absence of a published amino-acid sequence at the time of tracing, or simply to trivial human mistakes.
The most common type of model-building error is locally incorrect main-chain and/or side-chain conformations. Such errors are easy to make in low-resolution maps calculated with imperfect phases. Moreover, multiple conformations are often unresolved even at moderately high resolution (~2 Å), which further complicates the interpretation of side-chain density. Nevertheless, many of them can be avoided.
Various types of error (possibly, to some extent, compensating ones) can be introduced during refinement. This can be "achieved" by manipulating the data-to-parameter ratio, i.e. the ratio between the amount of data (both experimental data and other sources of information, such as "ideal" bond lengths etc.) and the number of parameters in the model. This can be done, for instance, by removing data (in particular if it doesn't agree well with the model ...) or by adding hundreds of extra atoms (in particular water molecules). At a resolution of 1 Å, the ratio of experimental observations alone to model parameters may be as high as 10 or 20, whereas at 2 Å this will be only ~1-3, and at 3 Å it will usually be below one! This category should perhaps not be classified as "errors" but rather as poor or inappropriate refinement practice.

Why do models with serious flaws nevertheless end up in the literature and databases? Possible causes may be:

inexperienced, under-supervised people do the work (and have a supervisor who may be in a hurry to publish)
computer programs are used as black boxes
methodological improvements are not adopted until the limitations of older ones have been experienced first-hand
intermediate models are not subjected to critical and systematic quality analysis
"quality indicators" are used that are strongly correlated with parameters that are restrained during refinement (RMS deviation of bond lengths and angles from ideal values, etc.)

What is a good model?

In brief: a good model makes sense in all respects. In other words, it passes just about every test you can think of to assess its validity (and even those that nobody has thought of yet ...). This includes:

chemical sense: normal bond lengths and angles, correct chirality (no D-amino acids unless you know from other sources that they are present), flat aromatic rings, flat sp²-hybridised carbons, etc.
physical sense: no interpenetrating non-bonded atoms, favourable packing of the molecules in the crystal, sensible patterns of temperature factors, occupancies of alternative conformations add up to one, etc.
crystallographic sense: the model adequately explains the experimental data
statistical sense: the model is the best hypothesis to explain the data, with minimal over-fitting (few assumptions)
protein-structural sense: the model "looks like a protein": reasonable Ramachandran plot, not too many unusual side-chain conformations, not too many buried charges, residues are "happy" in their environment
biological sense: the model explains previous observations (although sometimes it may disprove earlier experiments ...), e.g. with respect to activity, specificity, effect of mutations or inhibitors; also, the model should enable you to make predictions (e.g.: "if we mutate this aspartate to asparagine, it should kill the enzyme") that can be tested experimentally ("falsifiable hypotheses")

Some concepts

Accuracy, precision and significance are three concepts that many crystallographers use in a hand-waving sort of way. Please remember:

Accuracy is related to how far from "the true value" a measurement is (or how far from the true structure a model is)
Precision has to do with the level of detail (the decimal places, if you like) and reproducibility

Knowing this, you will realise that 4.7352516121262 is a very precise, but hopelessly inaccurate approximation of the value of pi. On the other hand, 3.14 is a considerably more accurate, but not very precise approximation.

To most crystallographers, "significant" means: "I think it's large". You will often read things like "the loop is shifted significantly" or "the orientation of the aspartate sidechain in the two complexes differs significantly" in papers describing protein crystal structures. This rarely (if ever) implies that any statistical significance test has been carried out.

Another trap to be aware of (and one that countless many crystallographers have also fallen into) is that of deriving "high-resolution information" from low-resolution models. For instance, in a typical 3 Å structure, the uncertainty in the position of the individual atoms can easily be 0.5 Å or more. Nevertheless, many such models have been described where hydrogen bonding distances are listed with a precision (note: not accuracy!) of 0.01 Å (probably because the program that was used to calculate these distances used that particular precision), and solvent-accessible surface areas with a precision of 1 Å² ...

Validation criteria

If you look it up in a dictionary, "validation" is defined as:

to declare or make legally valid
to mark with an indication of official sanction
to substantiate or verify

Many statistics, methods, and programs were developed in the 1990s to help identify errors in protein models. These methods generally fall into two classes: one in which only coordinates are considered (such methods often entail comparison of a model to information derived from structural databases), and another in which both the model and the crystallographic data are taken into account. Alternatively, one can distinguish between methods that essentially measure how well the refinement program has succeeded in imposing restraints (e.g., deviations from ideal geometry, conventional R-value) and those that assess aspects of the model that are "orthogonal" to the information used in refinement (e.g., free R-value, patterns of non-bonded interactions, conformational torsion-angle distributions). An additional distinction can be made between methods that provide overall (global) statistics for a model (such methods are suitable for monitoring the progress of the refinement and rebuilding process) and those that provide information at the level of residues or atoms (such methods are more useful for detecting local problems in a model). It is important to realise that almost all coordinate-based validation methods detect outliers (i.e., atoms or residues with unusual properties): to assess whether an outlier is an error in the model or whether it is a genuine, but unusual, feature of the structure, one must inspect the (preferably unbiased) electron-density maps!

If you are interested in the overall quality of a model (e.g., to decide if it's good enough to use as a starting point for comparative modelling), strong and global quality indicators are most useful. Examples of such criteria are (don't worry - you'll learn what they are soon):

Free R-value
Packing score
Ramachandran plot

If, on the other hand, you are interested in finding the local errors (to decide if the active site in a model is good enough to use it for the design of ligands), strong and local methods are most suitable. Examples of these are:

Real-space fit
Main-chain torsion-angle combinations
Side-chain torsion-angle combinations

Unfortunately, in many papers that describe macromolecular crystal structures, "quality criteria" are quoted that do not necessarily provide any indication whatsoever of the actual quality of the model. Examples are:

Conventional R-value
RMS deviation of bond lengths and angles from "ideal" values
Average temperature factor of the atoms in the model

It is also important to realise that every quality check that a model passes provides a necessary but insufficient indication of the model's correctness. Remember that a good model makes sense in just about every respect.

Latest update on 26 January, 2009.