11 Autopsy of a PDB file

Atom coordinates of protein and nucleic acid structures are distributed under the form of PDB files. Those are 80 column formatted text files and present the advantage of being platform independent.

The first six columns are reserved for a keyword describing the type of information that follows on the line (such as HEADER, JRNL, REMARK, ATOM, HETATM, and so on).

A typical PDB file contains a header with information about the entry, literature references, as well as additional remarks that may contain information about how the protein was crystallised, the resolution and so on...

A partial example of PDB file is given below; parts removed are signalled by (...).

 
1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
--------------------------------------------------------------------------------
 
HEADER    OXIDOREDUCTASE(NAD(A)-CHOH(D))          12-APR-89   4MDH      4MDH   3
COMPND    CYTOPLASMIC MALATE DEHYDROGENASE (E.C.1.1.1.37)               4MDH   4
SOURCE    PORCINE (SUS $SCROFA) HEART                                   4MDH   5
AUTHOR    J.J.BIRKTOFT,L.J.BANASZAK                                     4MDH   6
REVDAT   3   15-APR-92 4MDHB   3       ATOM                             4MDHB  1
REVDAT   2   15-JAN-90 4MDHA   1       JRNL                             4MDHA  1
REVDAT   1   19-APR-89 4MDH    0                                        4MDH   7
SPRSDE     19-APR-89 4MDH      2MDH                                     4MDH   8
JRNL        AUTH   J.J.BIRKTOFT,G.RHODES,L.J.BANASZAK                   4MDH   9
JRNL        TITL   REFINED CRYSTAL STRUCTURE OF CYTOPLASMIC MALATE      4MDHA  2
JRNL        TITL 2 DEHYDROGENASE AT 2.5-*ANGSTROMS RESOLUTION           4MDHA  3
JRNL        REF    BIOCHEMISTRY                  V.  28  6065 1989      4MDHA  4
JRNL        REFN   ASTM BICHAW  US ISSN 0006-2960                  033  4MDHA  5
REMARK   1                                                              4MDH  14
REMARK   1 REFERENCE 1                                                  4MDH  15
REMARK   1  AUTH   J.J.BIRKTOFT,Z.FU,G.E.CARNAHAN,G.RHODES,             4MDH  16
REMARK   1  AUTH 2 S.L.RODERICK,L.J.BANASZAK                            4MDH  17
REMARK   1  TITL   COMPARISON OF THE MOLECULAR STRUCTURES OF            4MDH  18
REMARK   1  TITL 2 CYTOPLASMIC AND MITOCHONDRIAL MALATE DEHYDROGENASE   4MDH  19
REMARK   1  REF    TO BE PUBLISHED                                      4MDH  20
REMARK   1  REFN                                                   353  4MDH  21
                                   (...)

The next section provides information on the amino-acid sequence of each chain. The current example contains two chains (A and B).

SEQRES   1 A  334  ACE SER GLU PRO ILE ARG VAL LEU VAL THR GLY ALA ALA  4MDH 163
SEQRES   2 A  334  GLY GLN ILE ALA TYR SER LEU LEU TYR SER ILE GLY ASN  4MDH 164
SEQRES   3 A  334  GLY SER VAL PHE GLY LYS ASP GLN PRO ILE ILE LEU VAL  4MDH 165
 
                                   (...)
 
SEQRES  24 A  334  VAL GLU GLY LEU PRO ILE ASN ASP PHE SER ARG GLU LYS  4MDH 186
SEQRES  25 A  334  MET ASP LEU THR ALA LYS GLU LEU ALA GLU GLU LYS GLU  4MDH 187
SEQRES  26 A  334  THR ALA PHE GLU PHE LEU SER SER ALA                  4MDH 188
SEQRES   1 B  334  ACE SER GLU PRO ILE ARG VAL LEU VAL THR GLY ALA ALA  4MDH 189
SEQRES   2 B  334  GLY GLN ILE ALA TYR SER LEU LEU TYR SER ILE GLY ASN  4MDH 190
SEQRES   3 B  334  GLY SER VAL PHE GLY LYS ASP GLN PRO ILE ILE LEU VAL  4MDH 191
 
                                   (...)
 
SEQRES  24 B  334  VAL GLU GLY LEU PRO ILE ASN ASP PHE SER ARG GLU LYS  4MDH 212
SEQRES  25 B  334  MET ASP LEU THR ALA LYS GLU LEU ALA GLU GLU LYS GLU  4MDH 213
SEQRES  26 B  334  THR ALA PHE GLU PHE LEU SER SER ALA                  4MDH 214
 
                                   (...)

The next section contains optional information about HET groups (see the HETATM section that will follow for a more detailed description).

 
HET    NAD  A   1      44     NAD CO-ENZYME                             4MDH 219
HET    SUL  A   2       5     SULFATE                                   4MDH 220
HET    NAD  B   1      44     NAD CO-ENZYME                             4MDH 221
HET    SUL  B   2       5     SULFATE                                   4MDH 222
FORMUL   3  NAD    2(C21 H28 N7 O14 P2)                                 4MDH 223
FORMUL   4  SUL    2(O4 S1)                                             4MDH 224
FORMUL   5  HOH   *471(H2 O1)                                           4MDH 225
 
                                   (...)

The next section describe secondary structure elements (HELIX, SHEET and TURN) as they have been provided by the crystallographer. This can be subjective as the definition of these secondary structure elements is loose.

 
HELIX    1 1BA GLY A   13  LEU A   20  1                                4MDH 226
HELIX    2 2BA LEU A   20  GLY A   26  1                                4MDH 227
HELIX    3  CA MET A   45  ALA A   60  1                                4MDH 228
                                   (...)
SHEET    1 S1A 6 LEU A  63  THR A  70  0                                4MDH 250
SHEET    2 S1A 6 PRO A  34  ASP A  41  1                                4MDH 251
SHEET    3 S1A 6 ILE A   4  GLY A  10  1                                4MDH 252
                                   (...)
TURN     1  T1 VAL A   8  ALA A  11                                     4MDH 274
TURN     2  T2 GLY A  10  GLY A  13                                     4MDH 275
TURN     3  T3 GLY A  26  PHE A  29                                     4MDH 276
                                   (...)

The next section describe crystallographic information (crystal groups)

 
CRYST1  139.200   86.600   58.800  90.00  90.00  90.00 P 21 21 2     8  4MDH 328
ORIGX1      1.000000  0.000000  0.000000        0.00000                 4MDH 329
ORIGX2      0.000000  1.000000  0.000000        0.00000                 4MDH 330
ORIGX3      0.000000  0.000000  1.000000        0.00000                 4MDH 331
SCALE1      0.007184  0.000000  0.000000        0.00000                 4MDH 332
SCALE2      0.000000  0.011547  0.000000        0.00000                 4MDH 333
SCALE3      0.000000  0.000000  0.017007        0.00000                 4MDH 334
MTRIX1   1 -0.865540  0.467810 -0.178880       55.21400    1            4MDH 335
MTRIX2   1  0.499790  0.829880 -0.248020       -1.79900    1            4MDH 336
MTRIX3   1  0.032420 -0.304070 -0.952100       89.13300    1            4MDH 337
 
                                   (...)

And finally atom coordinates for amino-acids (or nucleic acids) are provided. Each line starts with the ATOM keyword, and is followed by atom number, atom name, amino-acid name, chain name, amino-acid number, X, Y, Z coordinates, atom weight, and B-factor (this last number can be viewed as an incertitude factor (0-100). A low B-factor meaning that the position of the atom has been determined with accuracy. Typically B-factors of Alpha Carbons (CA) are lower than atoms located at side chain extremities.

 
ATOM      1  C   ACE A   0      11.590   2.938  35.017  1.00 45.90      4MDHB  5
ATOM      2  O   ACE A   0      12.581   2.371  35.517  1.00 28.75      4MDHB  6
ATOM      3  CH3 ACE A   0      10.179   2.477  35.417  1.00 36.75      4MDHB  7
ATOM      4  N   SER A   1      11.648   3.946  34.081  1.00 49.10      4MDH 341
ATOM      5  CA  SER A   1      12.901   4.557  33.573  1.00 52.42      4MDH 342
ATOM      6  C   SER A   1      12.733   5.624  32.482  1.00 48.48      4MDH 343
ATOM      7  O   SER A   1      13.238   5.432  31.363  1.00 57.03      4MDH 344
ATOM      8  CB  SER A   1      13.990   3.553  33.162  1.00 41.45      4MDH 345
ATOM      9  OG  SER A   1      15.105   3.679  34.039  1.00 42.59      4MDH 346
ATOM     10  N   GLU A   2      12.073   6.774  32.772  1.00 37.72      4MDH 347
ATOM     11  CA  GLU A   2      11.948   7.788  31.721  1.00 20.88      4MDH 348
ATOM     12  C   GLU A   2      12.042   9.235  32.169  1.00 28.31      4MDH 349
ATOM     13  O   GLU A   2      11.285   9.654  33.030  1.00 14.56      4MDH 350
ATOM     14  CB  GLU A   2      10.925   7.482  30.621  1.00 18.66      4MDH 351
ATOM     15  CG  GLU A   2      10.188   8.729  30.102  1.00 39.41      4MDH 352
ATOM     16  CD  GLU A   2       8.693   8.532  30.110  1.00 55.62      4MDH 353
ATOM     17  OE1 GLU A   2       7.885   9.153  29.379  1.00 55.67      4MDH 354
ATOM     18  OE2 GLU A   2       8.352   7.589  30.997  1.00 68.00      4MDH 355
 
                                   (...)

As several enzymes are crystallised in presence of enzymatic cofactors or substrate analogues, that have to be described. As the number of substrates is too large to be described, a generic structure named HETATM regroups all atoms belonging to specific compounds other than amino-acids or nucleotides. In the following example NAD (nicotinamide adenine dinucleotide) and SO4 (sulphate) are described as HETATM. Solvent molecules (H2O) that are seen in the electronic density map also appear in this section.

 
HETATM 5158 AP   NAD B   1      42.641  30.361  41.284  1.00 26.73      4MDH5495
HETATM 5159 AO1  NAD B   1      43.440  31.570  40.868  1.00 20.69      4MDH5496
HETATM 5160 AO2  NAD B   1      41.161  30.484  41.376  1.00 33.73      4MDH5497
HETATM 5161 AO5* NAD B   1      43.117  29.802  42.683  1.00 20.55      4MDH5498
HETATM 5162 AC5* NAD B   1      44.483  29.615  43.002  1.00 17.23      4MDH5499
                                   (...)
HETATM 5202  S   SO4 B   2      44.842  24.424  31.662  1.00 72.77      4MDH5539
HETATM 5203  O1  SO4 B   2      45.916  23.890  32.631  1.00 31.43      4MDH5540
HETATM 5204  O2  SO4 B   2      44.065  23.296  30.916  1.00 26.35      4MDH5541
HETATM 5205  O3  SO4 B   2      45.570  25.307  30.620  1.00 52.53      4MDH5542
HETATM 5206  O4  SO4 B   2      43.834  25.257  32.482  1.00 47.91      4MDH5543
HETATM 5207  O   HOH     0      15.379   1.907   3.295  1.00 58.12      4MDH5544
HETATM 5208  O   HOH     1      58.861   0.984  17.024  1.00 37.58      4MDH5545
HETATM 5209  O   HOH     2      24.384   1.184  74.398  1.00 35.92      4MDH5546
 
                                   (...)

HETATM fields describe only atoms positions, but as they concern non-standard groups, programs don't know which atoms are effectively connected. These information are found in the CONECT fields. In the example provided below, atom number 74 has to be connected to atoms 69 and 75. In absence of CONECT information, atoms are usually connected if they are closer than 2 angstroms.

 
CONECT   74   69   75                                                   4MDH6015
CONECT   77   76                                                        4MDH6016
CONECT   92   90   93                                                   4MDH6017
CONECT   99   98                                                        4MDH6018
 
                                   (...)