
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
This page summarizes the accuracy of GENSCAN for invertebrate (Drosophila) and plant (maize and Arabidopsis) genomic sequences. The first table gives accuracy statistics for the vertebrate version of the program on datasets of sequences from these organisms (see table legend for details). The nucleotide-level accuracy statistics for Drosophila and maize are generally comparable to the high levels achieved for vertebrate sequences, but the exon-level accuracy is to varying degrees lower. This version of the program is therefore recommended for Drosophila sequences and may be used for maize, but is not recommended for Arabidopsis sequences because of the unacceptably low nucleotide-level sensitivity and the high proportion of missed exons. Organism-specific versions of GENSCAN which perform better on maize and Arabidopsis sequences are discussed below.
| Accuracy per bp | Accuracy per exon | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Organism | No. Seq | Sn | Sp | AC | CC | Sn | Sp | (Sn+Sp)/2 | ME | WE |
| Drosophila | 202 | 0.96 | 0.92 | 0.89 | 0.90 | 0.68 | 0.68 | 0.68 | 0.11 | 0.10 |
| Maize | 42 | 0.94 | 0.93 | 0.90 | 0.90 | 0.67 | 0.71 | 0.69 | 0.09 | 0.08 |
| Arabidopsis | 120 | 0.81 | 0.93 | 0.78 | 0.84 | 0.57 | 0.72 | 0.66 | 0.25 | 0.04 |
Accuracy statistics for gene prediction programs are described here.
The vertebrate version of GENSCAN was tested on the sequence sets described below. The set of 202 Drosophila melanogaster GenBank loci used was constructed by D. Kulp (U. C. Santa Cruz) and M. G. Reese (Lawrence Berkeley National Laboratories) on 12 Dec. 1996 as a standard for training/testing of gene prediction methods and is available by anonymous ftp. The set of 41 Zea mays GenBank loci was constructed by V. Brendel at Stanford University and is available by email on request. The set of 120 Arabidopsis thaliana GenBank loci was also constructed by V. Brendel and is described in:
Kleffe, J., Hermann, K., Vahrson, W., Wittig, B. and Brendel, V. (1996) Nucl. Acids Res. 24, 4718-4728.
All three sets consist exclusively of nonredundant complete genes whose annotation appears reliable by a variety of tests.
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
| Accuracy per bp | Accuracy per exon | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Organism | No. Seq | Sn | Sp | AC | CC | Sn | Sp | (Sn+Sp)/2 | ME | WE |
| Maize | 42 | 0.86 | 0.96 | 0.86 | 0.90 | 0.78 | 0.87 | 0.84 | 0.15 | 0.04 |
| Arabidopsis | 120 | 0.91 | 0.93 | 0.86 | 0.89 | 0.67 | 0.69 | 0.69 | 0.11 | 0.08 |