DoriC: an updated database of bacterial and archaeal replication origins

Bacteria Archaea

INTRODUCTION

The identification of the replication origins will be helpful to reveal the regulatory mechanisms of the initiation step in DNA replication (Mott and Berger, Nat. Rev. Microbiol, 2007; Katayama et al., Nat. Rev. Microbiol, 2010) and discover the new broad-spectrum antibacterial drugs (Robinson et al., Current Drug Targets, 2012). Based on the Z-curve theory (Zhang and Zhang, Archaea, 2005), we have developed a web-based system Ori-Finder for finding oriCs in bacterial genomes with high accuracy and reliability (Gao and Zhang, BMC Bioinformatics, 2008), and the predicted oriC regions in bacterial genomes have been organized into an online database DoriC (Gao and Zhang, Bioinformatics, 2007). Based on Ori-Finder, origins of replication in Sorangium cellulosum, Microcystis aeruginosa (Gao and Zhang, DNA Research, 2008) and Cyanothece 51142 (Gao and Zhang, Proc. Natl. Acad. Sci. USA, 2008), which could not be determined by using standard GC skew, have been identified by taking advantage of comparative genomics. The application of the proposed oriC selection criteria and the comparison of different cyanobacterial strains may also gain insight into the replication of origins in other Cyanobacteria (Welsh et al., Proc. Natl. Acad. Sci. USA, 2008). Since the database was constructed in 2007, we noticed that the replication origins of Anabaena sp. PCC 7120 (Zhou et al., Microbiology, 2011), Cytophaga hutchinsonii ATCC 33406 (Xu et al., Appl. Microbiol. Biotechnol., 2012) and Synechococcus elongatus PCC 7942 (Watanabe et al., Molecular Microbiology, 2012) have been confirmed by experiments, which are all consistent with our predictions in DoriC. Due to the continuously update, the database has been widely used in the comparative genomics analysis. For example, as a source of data, DoriC has been used in the study of the relationship between the functionality of essential genes and gene strand-bias in bacterial genomes (Lin, Gao and Zhang, Biochem. Biophys. Res. Commun., 2010), in the analysis of nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes (Qu et al., Research in Microbiology, 2010), in the investigation of the association between growth-related traits and minimal generation times (Vieira-Silva and Rocha, PLoS Genetics, 2010), in an algorithm for prediction of putative essential and core-essential genes in Mycoplasma genomes (Lin and Zhang, Scientific Reports, 2011), in the research on coordination of spatiotemporal gene expression during the bacterial growth cycle (Sobetzkoa et al., Proc. Natl. Acad. Sci. USA, 2012), and in the study of the variation in terms of the percentage of leading-strand genes across different bacteria (Mao et al., Nucleic Acids Research, 2012) etc.

DATABASE DESCRIPTION

DoriC is built using a relational database (MySQL) allowing rapid retrieval of data and making resource easily maintainable. DoriC can be arranged in the order of Species, Accession numbers of GenBank or DoriC, respectively. In general, one entry in DoriC corresponds to one chromosome. However, for some bacteria, e.g., Bacillus subtilis, the oriC region is split into two sub-regions by the dnaA gene(Moriya et al., Molecular Microbiology, 1992), which results in two entries for one origin, e.g., entries ORI10010005 and ORI10010006 compose the single oriC region of Bacillus subtilis. Consequently, the number of entries is inconsistent with that of chromosomes. In fact, only one origin is predicted in DoriC for bacteria. For archaea, based on our results in DoriC, it shows that there is one replication origin in the genomes within the order Methanococcales (11 genomes) and within the class Thermococci (12 genomes), and three replication origins in Sulfolobus species (13 genomes). Our results and the Z-curves also show that the archaea within the Crenarchaeota phylum contain multiple origins, although some origins could not be determined at the sequence level currently. For example, Pyrobaculum calidifontis has been experimentally characterized to four replication origins, which is the highest number detected in a prokaryotic organism. However, only one origin can be determined at the sequence level (Pelve et al., Mol. Microbiol., 2012). More details about the number of oriCs for the archaea in DoriC have been also presented in the form of the phylogenetic tree generated by CVTree (Click here to see the figure).

 

Funding: The present work was supported in part by the National Natural Science Foundation of China (Grant Nos. 90408028, 31171238, 30800642 and 10747150).