8 How to evaluate the quality of a model

8 How to evaluate the quality of a model

8.1 General considerations.

A model is considered wrong if at least part of its structural features a miss-placed relatively to the rest of the model. Errors of that type can very easily slip into a model when erroneous sequence alignments are used during the building procedure. Such models can nevertheless have proper stereochemistry if one gives great care to this aspect during the building procedure.
In absolute terms a model can be declared inaccurate or imprecise if its atomic co-ordinates are not within 0.5 A rmsd of a control experimental structure. This value comes from the structure/sequences similarity study of Chothia and Lesk [6], in which they demonstrate that different structures of a same protein can deviate by as much as 0.5 A. This criterion can however only be assessed after the fact, and is thereby not usable. In relative terms, however, a model can be considered "accurate enough" or as "accurate as you can get" when its rmsd is within the spread of deviations observed for experimental structures displaying a similar sequence identity level as the target and template sequences.
Another source of inaccuracy is the deviation from ideal stereochemical values for bond lengths and angles. Such inaccuracies can be easily detected with the program WhatCheck developed by G. Vriend at the EMBL (Can be reached from the PDB Web site).
It is crucial to realise that proper stereochemistry as can be assessed with WhatCheck is not a criteria for model correctness. In other terms, it is possible, to build models which would comply with such criteria and have strictly no biological meaning.
Empirical pair-potentials allow, to some degree, the detection of such errors in models. These algorithms are indeed not sensitive enough to detect subtle differences in conformation but are quite efficient at pointing out regions where sequence and structure do not fit.

8.2 What are the sources of errors and inaccuracies?

The quality of a model is determined by two criteria, which will define its applicability (see Part IV):

The correctness of a model is essentially dictated by the quality of the sequence alignment used to guide the modelling process. If the sequence alignment is wrong in some regions, then the spatial arrangement of the residues in this portion of the model will be incorrect.
The accuracy of a model is essentially limited by the deviation of the used template structure(s) relative to the (a future) experimental control structure. This limitation is inherent to the methods used, since the models result from an extrapolation. As a consequence, the C atoms of protein models which share 35 to 50% sequence identity with their templates, will generally deviate by 1.0 to 1.5 Å from their experimental counter parts, as do similarly related experimental structures [6]. Furthermore, structural differences between predicted and experimental structures have two sources:

The errors inherent to the modelling procedures.
The variations caused by the molecular environment and data collection method incorporated into experimentally elucidated structures which will be used as modelling templates. Indeed, crystallographic structures of identical proteins can vary not only because of experimental errors and differences in data collection conditions (illustrated in [32]) and refinement, but also because of different crystal lattice contacts and the presence or absence of ligands. One of the most interesting examples in which several structures of the same protein, determined by different methods, were compared involves interleukin-4 (IL-4) [33 and references therein]. This cytokine consists of a 130 residue four helix bundle, and its structure was elucidated by x-ray crystallography as well as by NMR. The backbones of three IL-4 crystal structures (PDB entries 1RCB, 2INT and 1HIK) show an rmsd of 0.4 to 0.9 Å, while those of three IL-4 NMR forms (PDB entries 1ITM, 1CYL and 2CYK) give rmsd of 1.2 to 2.6 Å. These values illustrate the structural differences due to experimental procedures and the molecular environment at the time of data collection. Therefore, "a protein model derived by comparative methods cannot be more accurate than the difference between the NMR and crystallographic structure of the same protein." [33].

8.3 Protein core and loops.

Almost every protein model contains non-conserved loops which are expected to be the least reliable portions of a protein model. Indeed, these loops often deviate markedly from experimentally determined control structures. In many cases, however, these loops also correspond to the most flexible parts of the structure as evidenced by their high crystallographic temperature factors (or multiple solutions in NMR experiments). On the other hand, the core residues - the least variable in any given protein family - are usually found in essentially the same orientation as in experimental control structures, while far larger deviations are observed for surface amino acids. This is expected since the core residues are generally well conserved and the conformation of their side chains are constrained by neighbouring residues. In contrast, the more variable surface amino acids will tend to show more deviations since there are few steric constraints imposed upon them.

8.4 Detecting major errors using empirical pair potentials.

Some structural aspects of a protein model can be verified using methods based on the inverse folding approach. Two of them, namely the 3D-1D profile based verification method [15] and ProsaII developed by M. Sippl [16], are widely used. The 3D-1D profile of a protein structure is calculated by adding the probability of occurrence for each residue in its 3D-context [15]. Each of the twenty amino acids has a certain probability to be located in any environmental classes (defined by criteria such as solvent-accessible surface, buried polar, exposed non-polar area and secondary structure) defined by Eisenberg and colleagues. In contrast, ProsaII [16] relies on empirical energy potentials derived from the pairwise interactions observed in well defined protein structures. These terms are summed over all residues in a model and result in a more or less favourable energy.
Both methods can detect a global sequence to structure incompatibility and errors corresponding to topological differences between template and target. They also allow the detection of more localised errors such as b-strands that are "out of register" or buried charged residues. These methods are however unable to detect the more subtle structural inconsistencies often localised in non-conserved loops, and cannot provide an assessment of the correctness of their geometry.

8.5 Applicability of model structures.

Protein model obtained with comparative modelling methods can be classified into three broad categories:

Models which are based on incorrect alignments between target and template sequences. Such alignment errors, which generally reside in the inaccurate positioning of insertions and deletions, are caused by the weaknesses of the alignment algorithms and can generally not be resolved in the absence of a control experimental structure. It is however often possible to correct such errors by producing several models based on alignment variants and by selecting the most "sensible" solution. Nevertheless, it turns out that such models are often useful as the errors are not located in the area of interest, such as within a well conserved active site.

Models based on correct alignments are of course much better, but their accuracy can still be medium to low as the templates used during the modelling process have a medium to low sequence similarity with the target sequence. Such models, as the ones described above, are however very useful tools for the rational mutagenesis experiment design. They are however of very limited assistance during detailed ligand binding studies.
The last category of models comprises all those which were build based on templates which share a high degree of sequence identity (> 70%) with the target. Such models have proven useful during drug design projects and allowed the taking of key decisions in compound optimisation and chemical synthesis. For instance, models of several species variants of a given enzyme can guide the design of more specific non-natural inhibitors.

However, nothing is absolute and there are numerous occasions in which models falling in any of the above categories, could either not be used at all or in contrast proved to be more useful and correct than estimated.