is a new system to recognize whole protein coding genes in coronavirus genomes, especially suitable for SARS coronaviruses, in which the proteinase cleavage sites within the polyprotein encoded by ORF1ab are accurately predicted. For more detail, refer to the paper.
Recently, we have developed a coronavirus-specific gene-finding system, ZCURVE_CoV 1.0. In this paper, the system is further improved by taking the prediction of cleavage sites of viral proteinases in polyproteins into account. The cleavage sites of the 3C-like proteinase and papain-like proteinase are highly conserved. Based on the method of traditional positional weight matrix trained by the peptides around cleavage sites, the present method also sufficiently considers the length conservation of nonstructural proteins cleaved by the 3C-like proteinase and papain-like proteinase to reduce the false positive prediction rate. The improved system, ZCURVE_CoV 2.0, has been run for each of the 24 completely sequenced coronavirus genomes in GenBank. Consequently, all the nonstructural proteins in the 24 genomes are accurately predicted. Compared with known annotations, the performance of the present method is satisfactory.
