Most of the structures
and function of globular domains from proteome are yet unknown. In order
to get some information about the biological role of these domains, we
are modelling different globular domains, starting with SH3, SH2, WW, PDZ, PH, Methyl transferase, Acetyl transferase, WD40, VHS,
Pole box, Protein tyrosine phosphatase, PTB, FHA, BRCT, and 14-3-3, in
complex with several ligands, to perform a structural prediction of the
putative ligands. The modelling of the domains takes into account the
different ligand orientation when necessary as well as the orientation
of the important residues involved in binding (i.e., type -I and -II
ligands, and conserved Trp in the SH3 binding pocket). Briefly, the
internet resources allow the automated extraction of outstanding
information on domains SH3, SH2, WW, PDZ and PH (sequences, structural
coordinates, multiple sequence alignment, etc).
The comparison of
sequences and structures, allows for the classification and the
identification of the template structures suitable for homology
modelling. Different templates covering a broad range of structural
features are chosen and coupled to different ligands, as many as
possible, to get suitable templates for prediction. The complexes are
used as templates for homology modelling using Swiss Pdb Modeller
Server, Modeller, and WHATIF. Modelled domain-ligand complexes are
predicted by using FOLD-X, a protein design algorithm that looks for the
residues in the ligand that better fit in the binding pocket of the
domain, and constructs mutated structures using rotamer libraries of all
natural amino acids. Each position in the ligands are explored
individually in the sequence space and its neighbours positions in the
domain (right) are allowed to find the best rotamer in order to fit the
mutations in the ligands. Once all positions per ligand and per template
has been scanned, the generated structures are evaluated by FOLD-X in
terms of energy, allowing the selection of the better residue/s per
position. The results are tabulated in a scoring matrix reflecting the
ability of a natural amino acid to fit in a given position of a ligand,
in a given template. The sum of energy values (stability and binding) is
normalized with respect to the best residue, and all residues included
in a threshold (below +0.5 cal/mol) are selected for pattern
characterization.
The use of the scoring
matrix allows the scanning of a protein/peptide to determine its chance
to act as putative ligand, by simply computing the contribution of the
residues in the sequence in 4 to 10 residues length windows that moves
along the sequence. The previously published data regarding binding of
peptides and proteins to the domains (i.e.: SH2 and SH3 domains from Lck
or C-Src-like kinases, binding to peptides) are used to validate the
predictions, as well as to determine the most important positions for
binding.
Once validated the
methodology, the results obtained for predictions can be used to search
the genome. The scoring matrices are used to construct patterns to
search the putative partners of a domain in a genome. The use of
ScanProsite at http://www.expasy.ch
allows the identification of putative partner proteins by searching
Swiss-Prot database with a Prosite entry. Most of the hits should be
discarded because the recognized sequences do not belong to unordered
part of the protein, or the protein is confined to another cellular
compartment, etc. To explore protein motives for globularity and
disorder, we used GlobPlot at (http://globplot.embl.de/);
to explore the secondary structure, predictive methods were used (http://npsa-pbil.ibcp.fr/);
finally, the subcellular localization for domains and partners is
explored. After these filters, the surviving proteins are proposed to
interact specifically with the globular domain under study. The
development of the methodology for modelling, prediction and
localization of the putative partners, which is already working for SH3
domains, is of crucial importance since it can be of general
applicability for any domain involved in protein-protein interaction. |