7 De novo modelling of G-protein coupled receptors

7.1 Introduction

G-protein coupled receptors (GPCR), are seven-helix trans-membrane proteins which are essential in a great variety of physiological events that require the transmission of an external to an intracellular signal. Present in a broad range of organisms, they cause the activation of a guanine nucleotide-binding protein (G-protein) in response to stimuli as diverse as light, odorants, neurotransmitters, peptides and hormones.

Understanding the structure and mechanism of GPCRs is thus central to many aspects of cellular signalling and control. As a result, many multidisciplinary research projects, including those carried out by pharmaceutical companies to find new therapeutic molecules, are aimed at one or another member of this protein family (over 700 known members).

Since no experimentally elucidated 3D-structure is available for GPCRs, and given the high level of interest these proteins attract, it is not surprising that predictive methods to derive information about their 3D-structure are rapidly gaining interest. These efforts have resulted in several theoretical models of the trans-membrane regions that have been successfully used to rationalise site-directed mutagenesis experiments aimed at understanding the interactions between the proteins and their ligands.

7.2 Building GPCR templates

Template models for GPCR can be built using a newly described rule-based technique (Herzyk and Hubbard, 1995) based on the assumption that it is possible to generate models of the trans-membrane regions of GPCRs using (i) experimental data available for at least one member of a particular GPCR subfamily; (ii) the multiple sequences alignment of all known members of this subfamily and (iii) the 2D projection map available for bovine rhodopsin (Schertler et al, 1993). The methodology also requires the clear definition of the sequence of the trans-membrane regions. This is normally achieved by combining the helix assignments determined by J. Baldwin (Baldwin, 1993), with those obtained from the multiple sequence alignments using programs such as TMAP (Persson and Argos, 1994) and TopPred (von Heijne, 1992).

Since no experimental structure information is available for the loops connecting the helices of GPCRs, and due to the present lack of reliable de novo modelling methods for long loops, the latter will not be predicted and only the trans-membrane helices will be modelled.

Template building is divided into two stages. Firstly, a simplified template is constructed on the basis of experimental and theoretical data (Herzyk and Hubbard, 1995). Secondly, this template is converted into all-atom representation (Peitsch, 1995; Peitsch, 1996).

7.2.1 Template building, Stage 1

Throughout the first stage, the seven trans-membrane helices are represented as rigid, idealised helices. Residues belonging to these helices are represented by one Ca atom and one virtual side-chain atom whose size and position depends on the size and topology of the amino acid side-chain.

The simplified templates, for the way the helices are fitted together, are generated by the global optimisation of a penalty function (Herzyk and Hubbard, 1995). This function is to measure the violations of the structural restraints derived from experimental and theoretical data. The method currently considers restraints imposed on the positions of the helices with respect to the 2D projection map of bovine rhodopsin (XY dimension) and to the membrane plane (Z dimension), as well as orientational restraints which determine the lipid- and interior-facing portions of each helix. Additional distance restraints can be used to describe the relative position of selected residues and a ligand. The restraints on helix orientation may be derived from both mutagenesis experiments as well as from multiple sequence alignments. The relative distances between specific side chains (themselves determined by site-directed mutagenesis) and a ligand are based on their molecular structure and possible non-bonded contact distances (Herzyk and Hubbard, 1995). Each of the restraints is satisfied if its value is between pre-set limits, otherwise a contribution is added to the penalty function.

Monte-Carlo Simulated Annealing (MCSA) (Kirkpatrick et al, 1983) is then used to globally optimise the penalty function and produce an optimal model. Calculations for a MCSA trajectory begin with the randomised configuration of rigid body elements, which are then moved sequentially by a random step of a randomly chosen co-ordinate. Each trajectory consist of 25 temperature runs of one thousand Monte-Carlo steps. Using a collection of different starting positions of the model, we compute 50 MCSA trajectories. The final configurations are then averaged and this mean structure is idealised to yield the final template. In order to validate the method we performed the calculations for bacteriorhodopsin using the experimental data available prior to the high resolution electron microscopy (EM) structure (Henderson et al, 1990). This produced a simplified template in which Ca atom positions differed from the experimental EM structure by only 1.87Å root mean square deviation. All calculations are performed using a purpose built program "PANDA" developed at York (Herzyk and Hubbard, 1995).

7.2.2 Template building, Stage 2

At the second stage the initial model, is converted to a full atom representation using the backbone rebuilding and side chain reconstitution routines of ProMod (Peitsch, 1995; Peitsch, 1996). The stereochemistry of this model is then further optimised by 200 cycles of conjugate gradient energy minimisation using force fields such as CHARMm (Brooks et al, 1983).

7.3 Using the GPCR templates

As for the template building stage outline above, the modelling of other members of the GPCR family of proteins requires the clear definition of the sequence of the trans-membrane helices. This can be achieved by combining the helix assignments determined by J. Baldwin (Baldwin, 1993), with those obtained from the multiple sequence alignments using programs such as TMAP (Persson and Argos, 1994) and TopPred (von Heijne, 1992). As these predictions vary from one method to another and "manual" adjustments are generally necessary to overcome the limitations of the methods, the present version of the server does not provide an automated procedure for helix identification. In this way, the user may also test several helix sequence variations. The next step to obtain a GPCR model consist essentially of a comparative protein modelling attempt using the above described modelling templates.

7.6 Specific references

Baldwin JM (1993) EMBO J 12 1693-1703

Henderson R Baldwin JM Ceska TA Zemlin F Beckmann E and Downing KH (1990) J Mol Biol 213 899-929

Herzyk P and Hubbard RE (1995) Biophys J 69 2419-2442

Kirkpatrick SCD Gellat J and Vecchi MP (1983) Science 220 671-680

Persson B and Argos P (1994) J Mol Biol 227 493-509

Schertler GFX Villa C and Henderson R (1993) Nature 362 770-772

von Heijne G (1992) J Mol Biol 225 487-494