Experimental Determination and System-Level Analysis of Essential Genes in E. coli MG1655

S.Y. Gerdes1,*, M.D. Scholle1,*, J.W. Campbell1, G. Balazsi2, E. Ravasz3, M.D. Daugherty1, A.L. Somera2, N.C. Kyrpides1, I. Anderson1, M.S. Gelfand1, A. Bhattacharya1, V. Kapatral1, M. D'Souza1, M.V. Baev1, F. Mseeh1, M.Y. Fonstein1, R. Overbeek1, A.-L. Barabasi3, Z.N. Oltvai2 and A.L.Osterman1

I. Supplementary Table S1 (Excel format, PDF, TXT format).
II. Supplementary Table S2 (PDF, TXT format).
III. Results: additional illustrations and analysis (PDF).
IV. Supplementary Table S6 (PDF, TXT format).
V. Experimental and Analytical Procedures:
1. Genetic footprinting procedure (PDF).
2. Assessment of conditional gene essentiality based on genetic footprinting data (PDF).

Supplementary Table S1 (Excel format, PDF, TXT format)

Comments to Table S1: Essentiality assertions for all E. coli ORFs determined by genetic footprinting under conditions of aerobic logarithmic cell growth in rich medium.

This table lists all protein-coding E. coli genes in the order they occur along the chromosome. Genes are identified by their common names, unique identifiers in the ERGO database (Overbeek et al., 2003), b-numbers (Blattner et al., 1997), as well as Swiss-Prot IDs and functional annotations (as of July 2002). Positions of start codons are based on ORF calling within the ERGO database and may not correspond to those in other databases. Every ORF is assigned to one of twelve broad functional categories (AAM, Amino acids metabolism; CHM, Carbohydrate metabolism; NCM, Nucleotide and cofactor metabolism; LPC, Lipids, lipopolysaccharides, lipoproteins, peptidoglycan, cell wall; NAM, Nucleic acid metabolism; PMS, Protein metabolism and secretion; MSM, Miscellaneous metabolism; BEN, Bioenergetics; SMC, Signalling, motility, chemotaxis; RCD, Expression regulators and cell cycle/division; MTR, Membrane transporters; and PHT, Phage and transposase related). ORFs lacking specific functional annotations in Swiss-Prot as of July 2002 were considered uncategorized (UNC). Evolutionary retention index (ERI) for each ORF was determined as described in Experimental procedures.
Essentiality assertions were automatically determined based on the number of transposon insertions detected within an ORF, and on the relative intensity of electrophoretic bands corresponding each transposition event. These automatic calls were further curated manually using a graphical chromosomal viewer (examples). All ORFs were sorted into four categories based on the following criteria:
E = essential. Includes genes with no detectable insertions within their coding sequence (cds), and genes with only a few insertions within the 3'-most 20% or 5'-most 5% of the gene. A numerical measure of confidence in an assertion was calculated for each essential ORF (column "assertion error"). Assertion error shows the probability of missing an ORF by chance if insert locations were completely random. It was calculated as follows: po(L) = exp(-rL), where r is the local insertion density and L is the length of the ORF in base pairs. In our case, r was determined by counting the number of inserts within a 10 kb-long region centered on each ORF, excluding its coding sequence, and all essential ORFs and unanalyzed regions (gaps) in the area (see also Materials and Methods in the main article).
N = non-essential. Genes with one or more insertions located within the central 5 to 80% of the cds length were considered to be non-essential, except for a few relatively long ORFs (>1000 bp). These were asserted as "ambiguous" if the insertion density within the cds was significantly below the genomic average (3.2 inserts per 1000 bp).
X = Indicates the gene was not covered. These include genes for which no reliable PCR data could be obtained for various technical reasons.
? = ambiguous. ORFs, for which experimental evidence was insufficient to make specific conclusion about essentiality were asserted as ambigous.
For a more detailed description of these criteria and a discussion of potential sources of erroneous assertions, follow this link. For consistency all essentiality calls in Table S1 are based exclusively on our experimental data, without any corrections by context or otherwise. For example, accD is called nonessential and pdxJ is called essential regardless of convincing arguments to the contrary. Table S1 includes raw genetic footprinting data: the number of transposon insertions within each ORF and their locations (in amino acid coordinates) relative to the translational start. These data can be used to refine essentiality assessments. More detailed information, including inserts in non-coding regions, relative intensities of PCR products, and raw experimental data (primer positions, gel images) are available from the authors upon request.


Blattner, F. R., Plunkett, G., Bloch, C. A., Perna, N. T., Burland, V., Riley, M., ColladoVides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., et al. (1997). The complete genome sequence of Escherichia coli K-12. Science 277, 1453-1462.

Overbeek, R., Larsen, N., Walunas, T., D'Souza, M., Pusch, G., Selkov, E. J., Liolios, K., Joukov, V., Kaznadzey, D., Anderson, I., et al. (2003). The ERGO(TM) genome analysis and discovery system. Nucleic Acids Res 31, 164-171.