ZCURVE_CoV
2.0
is a new system to recognize whole protein coding genes in coronavirus genomes, especially suitable for SARS coronaviruses, in which the proteinase cleavage sites within the polyprotein encoded by ORF1ab are accurately predicted. For more detail, refer to the paper.
 
Submission Form
Please input or paste the complete genome DNA sequence into the sequence input window below. Here is an example of input sequence.
Sequence Name
Enter or Paste a Complete Genome Sequence in FASTA format.
* Warning: The software works normally only for coronavirus complete genome sequences. Input of partially complete genome sequences will result in wrong result.

Running Options bp is selected as the mininum ORF length.
  Predict the -1 frameshift site based on the heptamer UUUAAAC and protease cleavage sites in polyproteins.
Output Options Translate predicted genes into protein primary sequence.
  Output predicted genes in FASTA format.

 
Reference
F. Gao, H.-Y. Ou, L.-L. Chen, W.-X. Zheng and C.-T. Zhang* (2003).Prediction of proteinase cleavage sites in polyproteins of coronaviruses and its applications in analyzing SARS-CoV genomes.FEBS Letters, 553, 451-456. [PubMed]


* Corresponding author. E-mail: ctzhang@tju.edu.cn
Abstract
Recently, we have developed a coronavirus-specific gene-finding system, ZCURVE_CoV 1.0. In this paper, the system is further improved by taking the prediction of cleavage sites of viral proteinases in polyproteins into account. The cleavage sites of the 3C-like proteinase and papain-like proteinase are highly conserved. Based on the method of traditional positional weight matrix trained by the peptides around cleavage sites, the present method also sufficiently considers the length conservation of nonstructural proteins cleaved by the 3C-like proteinase and papain-like proteinase to reduce the false positive prediction rate. The improved system, ZCURVE_CoV 2.0, has been run for each of the 24 completely sequenced coronavirus genomes in GenBank. Consequently, all the nonstructural proteins in the 24 genomes are accurately predicted. Compared with known annotations, the performance of the present method is satisfactory.
Acknowledgments
We are indebted to Prof. Jingchu Luo in Peking University for the timely updated SARS-related information provided. We are also grateful to both referees for their constructive comments, which are very useful to improve the quality of the paper. Invaluable assistances from Ren Zhang are gratefully acknowledged. The present study was supported in part by the 973 Project of China (grant 1999075606).
TUBIC 2003