11 Autopsy of a PDB file
Atom coordinates of protein and nucleic acid structures are distributed under the form of PDB files. Those are 80 column formatted text files and present the advantage of being platform independent.
The first six columns are reserved for a keyword describing the type of information that follows on the line (such as HEADER, JRNL, REMARK, ATOM, HETATM, and so on).
A typical PDB file contains a header with information about the entry, literature references, as well as additional remarks that may contain information about how the protein was crystallised, the resolution and so on...
A partial example of PDB file is given below; parts removed are signalled by (...).
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
--------------------------------------------------------------------------------
HEADER OXIDOREDUCTASE(NAD(A)-CHOH(D)) 12-APR-89 4MDH 4MDH 3
COMPND CYTOPLASMIC MALATE DEHYDROGENASE (E.C.1.1.1.37) 4MDH 4
SOURCE PORCINE (SUS $SCROFA) HEART 4MDH 5
AUTHOR J.J.BIRKTOFT,L.J.BANASZAK 4MDH 6
REVDAT 3 15-APR-92 4MDHB 3 ATOM 4MDHB 1
REVDAT 2 15-JAN-90 4MDHA 1 JRNL 4MDHA 1
REVDAT 1 19-APR-89 4MDH 0 4MDH 7
SPRSDE 19-APR-89 4MDH 2MDH 4MDH 8
JRNL AUTH J.J.BIRKTOFT,G.RHODES,L.J.BANASZAK 4MDH 9
JRNL TITL REFINED CRYSTAL STRUCTURE OF CYTOPLASMIC MALATE 4MDHA 2
JRNL TITL 2 DEHYDROGENASE AT 2.5-*ANGSTROMS RESOLUTION 4MDHA 3
JRNL REF BIOCHEMISTRY V. 28 6065 1989 4MDHA 4
JRNL REFN ASTM BICHAW US ISSN 0006-2960 033 4MDHA 5
REMARK 1 4MDH 14
REMARK 1 REFERENCE 1 4MDH 15
REMARK 1 AUTH J.J.BIRKTOFT,Z.FU,G.E.CARNAHAN,G.RHODES, 4MDH 16
REMARK 1 AUTH 2 S.L.RODERICK,L.J.BANASZAK 4MDH 17
REMARK 1 TITL COMPARISON OF THE MOLECULAR STRUCTURES OF 4MDH 18
REMARK 1 TITL 2 CYTOPLASMIC AND MITOCHONDRIAL MALATE DEHYDROGENASE 4MDH 19
REMARK 1 REF TO BE PUBLISHED 4MDH 20
REMARK 1 REFN 353 4MDH 21
(...)
The next section provides information on the amino-acid sequence of each chain. The current example contains two chains (A and B).
SEQRES 1 A 334 ACE SER GLU PRO ILE ARG VAL LEU VAL THR GLY ALA ALA 4MDH 163
SEQRES 2 A 334 GLY GLN ILE ALA TYR SER LEU LEU TYR SER ILE GLY ASN 4MDH 164
SEQRES 3 A 334 GLY SER VAL PHE GLY LYS ASP GLN PRO ILE ILE LEU VAL 4MDH 165
(...)
SEQRES 24 A 334 VAL GLU GLY LEU PRO ILE ASN ASP PHE SER ARG GLU LYS 4MDH 186
SEQRES 25 A 334 MET ASP LEU THR ALA LYS GLU LEU ALA GLU GLU LYS GLU 4MDH 187
SEQRES 26 A 334 THR ALA PHE GLU PHE LEU SER SER ALA 4MDH 188
SEQRES 1 B 334 ACE SER GLU PRO ILE ARG VAL LEU VAL THR GLY ALA ALA 4MDH 189
SEQRES 2 B 334 GLY GLN ILE ALA TYR SER LEU LEU TYR SER ILE GLY ASN 4MDH 190
SEQRES 3 B 334 GLY SER VAL PHE GLY LYS ASP GLN PRO ILE ILE LEU VAL 4MDH 191
(...)
SEQRES 24 B 334 VAL GLU GLY LEU PRO ILE ASN ASP PHE SER ARG GLU LYS 4MDH 212
SEQRES 25 B 334 MET ASP LEU THR ALA LYS GLU LEU ALA GLU GLU LYS GLU 4MDH 213
SEQRES 26 B 334 THR ALA PHE GLU PHE LEU SER SER ALA 4MDH 214
(...)
The next section contains optional information about HET groups (see the HETATM section that will follow for a more detailed description).
HET NAD A 1 44 NAD CO-ENZYME 4MDH 219
HET SUL A 2 5 SULFATE 4MDH 220
HET NAD B 1 44 NAD CO-ENZYME 4MDH 221
HET SUL B 2 5 SULFATE 4MDH 222
FORMUL 3 NAD 2(C21 H28 N7 O14 P2) 4MDH 223
FORMUL 4 SUL 2(O4 S1) 4MDH 224
FORMUL 5 HOH *471(H2 O1) 4MDH 225
(...)
The next section describe secondary structure elements (HELIX, SHEET and TURN) as they have been provided by the crystallographer. This can be subjective as the definition of these secondary structure elements is loose.
HELIX 1 1BA GLY A 13 LEU A 20 1 4MDH 226
HELIX 2 2BA LEU A 20 GLY A 26 1 4MDH 227
HELIX 3 CA MET A 45 ALA A 60 1 4MDH 228
(...)
SHEET 1 S1A 6 LEU A 63 THR A 70 0 4MDH 250
SHEET 2 S1A 6 PRO A 34 ASP A 41 1 4MDH 251
SHEET 3 S1A 6 ILE A 4 GLY A 10 1 4MDH 252
(...)
TURN 1 T1 VAL A 8 ALA A 11 4MDH 274
TURN 2 T2 GLY A 10 GLY A 13 4MDH 275
TURN 3 T3 GLY A 26 PHE A 29 4MDH 276
(...)
The next section describe crystallographic information (crystal groups)
CRYST1 139.200 86.600 58.800 90.00 90.00 90.00 P 21 21 2 8 4MDH 328
ORIGX1 1.000000 0.000000 0.000000 0.00000 4MDH 329
ORIGX2 0.000000 1.000000 0.000000 0.00000 4MDH 330
ORIGX3 0.000000 0.000000 1.000000 0.00000 4MDH 331
SCALE1 0.007184 0.000000 0.000000 0.00000 4MDH 332
SCALE2 0.000000 0.011547 0.000000 0.00000 4MDH 333
SCALE3 0.000000 0.000000 0.017007 0.00000 4MDH 334
MTRIX1 1 -0.865540 0.467810 -0.178880 55.21400 1 4MDH 335
MTRIX2 1 0.499790 0.829880 -0.248020 -1.79900 1 4MDH 336
MTRIX3 1 0.032420 -0.304070 -0.952100 89.13300 1 4MDH 337
(...)
And finally atom coordinates for amino-acids (or nucleic acids) are provided. Each line starts with the ATOM keyword, and is followed by atom number, atom name, amino-acid name, chain name, amino-acid number, X, Y, Z coordinates, atom weight, and B-factor (this last number can be viewed as an incertitude factor (0-100). A low B-factor meaning that the position of the atom has been determined with accuracy. Typically B-factors of Alpha Carbons (CA) are lower than atoms located at side chain extremities.
ATOM 1 C ACE A 0 11.590 2.938 35.017 1.00 45.90 4MDHB 5
ATOM 2 O ACE A 0 12.581 2.371 35.517 1.00 28.75 4MDHB 6
ATOM 3 CH3 ACE A 0 10.179 2.477 35.417 1.00 36.75 4MDHB 7
ATOM 4 N SER A 1 11.648 3.946 34.081 1.00 49.10 4MDH 341
ATOM 5 CA SER A 1 12.901 4.557 33.573 1.00 52.42 4MDH 342
ATOM 6 C SER A 1 12.733 5.624 32.482 1.00 48.48 4MDH 343
ATOM 7 O SER A 1 13.238 5.432 31.363 1.00 57.03 4MDH 344
ATOM 8 CB SER A 1 13.990 3.553 33.162 1.00 41.45 4MDH 345
ATOM 9 OG SER A 1 15.105 3.679 34.039 1.00 42.59 4MDH 346
ATOM 10 N GLU A 2 12.073 6.774 32.772 1.00 37.72 4MDH 347
ATOM 11 CA GLU A 2 11.948 7.788 31.721 1.00 20.88 4MDH 348
ATOM 12 C GLU A 2 12.042 9.235 32.169 1.00 28.31 4MDH 349
ATOM 13 O GLU A 2 11.285 9.654 33.030 1.00 14.56 4MDH 350
ATOM 14 CB GLU A 2 10.925 7.482 30.621 1.00 18.66 4MDH 351
ATOM 15 CG GLU A 2 10.188 8.729 30.102 1.00 39.41 4MDH 352
ATOM 16 CD GLU A 2 8.693 8.532 30.110 1.00 55.62 4MDH 353
ATOM 17 OE1 GLU A 2 7.885 9.153 29.379 1.00 55.67 4MDH 354
ATOM 18 OE2 GLU A 2 8.352 7.589 30.997 1.00 68.00 4MDH 355
(...)
As several enzymes are crystallised in presence of enzymatic cofactors or substrate analogues, that have to be described. As the number of substrates is too large to be described, a generic structure named HETATM regroups all atoms belonging to specific compounds other than amino-acids or nucleotides. In the following example NAD (nicotinamide adenine dinucleotide) and SO4 (sulphate) are described as HETATM. Solvent molecules (H2O) that are seen in the electronic density map also appear in this section.
HETATM 5158 AP NAD B 1 42.641 30.361 41.284 1.00 26.73 4MDH5495
HETATM 5159 AO1 NAD B 1 43.440 31.570 40.868 1.00 20.69 4MDH5496
HETATM 5160 AO2 NAD B 1 41.161 30.484 41.376 1.00 33.73 4MDH5497
HETATM 5161 AO5* NAD B 1 43.117 29.802 42.683 1.00 20.55 4MDH5498
HETATM 5162 AC5* NAD B 1 44.483 29.615 43.002 1.00 17.23 4MDH5499
(...)
HETATM 5202 S SO4 B 2 44.842 24.424 31.662 1.00 72.77 4MDH5539
HETATM 5203 O1 SO4 B 2 45.916 23.890 32.631 1.00 31.43 4MDH5540
HETATM 5204 O2 SO4 B 2 44.065 23.296 30.916 1.00 26.35 4MDH5541
HETATM 5205 O3 SO4 B 2 45.570 25.307 30.620 1.00 52.53 4MDH5542
HETATM 5206 O4 SO4 B 2 43.834 25.257 32.482 1.00 47.91 4MDH5543
HETATM 5207 O HOH 0 15.379 1.907 3.295 1.00 58.12 4MDH5544
HETATM 5208 O HOH 1 58.861 0.984 17.024 1.00 37.58 4MDH5545
HETATM 5209 O HOH 2 24.384 1.184 74.398 1.00 35.92 4MDH5546
(...)
HETATM fields describe only atoms positions, but as they concern non-standard groups, programs don't know which atoms are effectively connected. These information are found in the CONECT fields. In the example provided below, atom number 74 has to be connected to atoms 69 and 75. In absence of CONECT information, atoms are usually connected if they are closer than 2 angstroms.
CONECT 74 69 75 4MDH6015
CONECT 77 76 4MDH6016
CONECT 92 90 93 4MDH6017
CONECT 99 98 4MDH6018
(...)