ADAN database

Overview

Most of the structures and function of globular domains from proteome are yet unknown. In order to get some information about the biological role of these domains, we are modelling different globular domains, starting with SH3, SH2, WW, PDZ, PH, Methyl transferase, Acetyl transferase, WD40, VHS, Pole box, Protein tyrosine phosphatase, PTB, FHA, BRCT, and 14-3-3, in complex with several ligands, to perform a structural prediction of the putative ligands. The modelling of the domains takes into account the different ligand orientation when necessary as well as the orientation of the important residues involved in binding (i.e., type -I and -II ligands, and conserved Trp in the SH3 binding pocket). Briefly, the internet resources allow the automated extraction of outstanding information on domains SH3, SH2, WW, PDZ and PH (sequences, structural coordinates, multiple sequence alignment, etc).

The comparison of sequences and structures, allows for the classification and the identification of the template structures suitable for homology modelling. Different templates covering a broad range of structural features are chosen and coupled to different ligands, as many as possible, to get suitable templates for prediction. The complexes are used as templates for homology modelling using Swiss Pdb Modeller Server, Modeller, and WHATIF. Modelled domain-ligand complexes are predicted by using FOLD-X, a protein design algorithm that looks for the residues in the ligand that better fit in the binding pocket of the domain, and constructs mutated structures using rotamer libraries of all natural amino acids. Each position in the ligands are explored individually in the sequence space and its neighbours positions in the domain (right) are allowed to find the best rotamer in order to fit the mutations in the ligands. Once all positions per ligand and per template has been scanned, the generated structures are evaluated by FOLD-X in terms of energy, allowing the selection of the better residue/s per position. The results are tabulated in a scoring matrix reflecting the ability of a natural amino acid to fit in a given position of a ligand, in a given template. The sum of energy values (stability and binding) is normalized with respect to the best residue, and all residues included in a threshold (below +0.5 cal/mol) are selected for pattern characterization.

The use of the scoring matrix allows the scanning of a protein/peptide to determine its chance to act as putative ligand, by simply computing the contribution of the residues in the sequence in 4 to 10 residues length windows that moves along the sequence. The previously published data regarding binding of peptides and proteins to the domains (i.e.: SH2 and SH3 domains from Lck or C-Src-like kinases, binding to peptides) are used to validate the predictions, as well as to determine the most important positions for binding.

Once validated the methodology, the results obtained for predictions can be used to search the genome. The scoring matrices are used to construct patterns to search the putative partners of a domain in a genome. The use of ScanProsite at http://www.expasy.ch allows the identification of putative partner proteins by searching Swiss-Prot database with a Prosite entry. Most of the hits should be discarded because the recognized sequences do not belong to unordered part of the protein, or the protein is confined to another cellular compartment, etc. To explore protein motives for globularity and disorder, we used GlobPlot at (http://globplot.embl.de/); to explore the secondary structure, predictive methods were used (http://npsa-pbil.ibcp.fr/); finally, the subcellular localization for domains and partners is explored. After these filters, the surviving proteins are proposed to interact specifically with the globular domain under study. The development of the methodology for modelling, prediction and localization of the putative partners, which is already working for SH3 domains, is of crucial importance since it can be of general applicability for any domain involved in protein-protein interaction.