-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
![[The thinking man's vertebrate]](images/puffer.jpg)
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
GENSCAN was tested on the set of 570 vertebrate gene sequences constructed by Burset & Guigó (1996) as a standard for comparison of gene finding methods. The table below lists accuracy statistics for GENSCAN and for other gene prediction programs which do not use protein sequence homology information. Statistics for all other programs are from Table 1 of Burset, M. & Guigó, R. (1996) Genomics 34, 353-367.
| Accuracy per nucleotide | Accuracy per exon | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Sn | Sp | AC | Sn | Sp | (Sn+Sp)/2 | ME | WE |
| GENSCAN | 0.93 | 0.93 | 0.91 | 0.78 | 0.81 | 0.80 | 0.09 | 0.05 |
| FGENEH | 0.77 | 0.85 | 0.78 | 0.61 | 0.61 | 0.61 | 0.15 | 0.11 |
| GeneID | 0.63 | 0.81 | 0.67 | 0.44 | 0.45 | 0.45 | 0.28 | 0.24 |
| GeneParser2 | 0.66 | 0.79 | 0.66 | 0.35 | 0.39 | 0.37 | 0.29 | 0.17 |
| GenLang | 0.72 | 0.75 | 0.69 | 0.50 | 0.49 | 0.50 | 0.21 | 0.21 |
| GRAILII | 0.72 | 0.84 | 0.75 | 0.36 | 0.41 | 0.38 | 0.25 | 0.10 |
| SORFIND | 0.71 | 0.85 | 0.73 | 0.42 | 0.47 | 0.45 | 0.24 | 0.14 |
| Xpound | 0.61 | 0.82 | 0.68 | 0.15 | 0.17 | 0.16 | 0.32 | 0.13 |
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
Performance does not appear to vary a great deal as a
function of C+G% content or for different subgroups
of vertebrates.
GENSCAN accuracy statistics for distinct subsets
of the Burset/Guigó test set are shown below.
| Accuracy per bp | Accuracy per exon | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Subset | No. Seq | Sn | Sp | AC | Sn | Sp | (Sn+Sp)/2 | ME | WE |
| C+G < 40% | 86 | 0.90 | 0.95 | 0.90 | 0.78 | 0.87 | 0.84 | 0.14 | 0.05 |
| C+G 40 - 50% | 220 | 0.94 | 0.92 | 0.91 | 0.80 | 0.82 | 0.82 | 0.08 | 0.05 |
| C+G 50 - 60% | 208 | 0.93 | 0.93 | 0.90 | 0.75 | 0.77 | 0.77 | 0.08 | 0.05 |
| C+G > 60% | 56 | 0.97 | 0.89 | 0.90 | 0.76 | 0.77 | 0.76 | 0.07 | 0.08 |
| Primates | 237 | 0.96 | 0.94 | 0.93 | 0.81 | 0.82 | 0.82 | 0.07 | 0.05 |
| Rodents | 191 | 0.90 | 0.93 | 0.89 | 0.75 | 0.80 | 0.78 | 0.11 | 0.05 |
| Non-mam. vert. | 72 | 0.93 | 0.93 | 0.90 | 0.81 | 0.85 | 0.84 | 0.11 | 0.06 |
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
The accuracy of GENSCAN is somewhat sensitive to exon length.
This dependence is summarized in the table below, which shows
accuracy statistics for exons from the
Burset/Guigó test set grouped according to length.
| Length | Annotated exons | Predicted exons | ||||||
|---|---|---|---|---|---|---|---|---|
| range (bp) | No. | %Exact | %Part | %Miss | No. | %Exact | %Part | %Wrong |
| <= 24 | 89 | 38 | 8 | 52 | 44 | 77 | 11 | 11 |
| 25 - 49 | 163 | 58 | 15 | 25 | 124 | 76 | 6 | 18 |
| 50 - 74 | 248 | 70 | 12 | 16 | 204 | 85 | 9 | 6 |
| 75 - 99 | 382 | 85 | 8 | 6 | 389 | 84 | 6 | 10 |
| 100 - 124 | 351 | 84 | 9 | 7 | 366 | 81 | 8 | 11 |
| 125 - 149 | 425 | 88 | 8 | 4 | 460 | 81 | 10 | 7 |
| 150 - 174 | 261 | 88 | 9 | 2 | 283 | 81 | 11 | 7 |
| 175 - 199 | 167 | 91 | 7 | 2 | 188 | 81 | 12 | 7 |
| 200 - 299 | 353 | 90 | 8 | 1 | 390 | 82 | 8 | 8 |
| >= 300 | 211 | 66 | 19 | 1 | 204 | 69 | 20 | 10 |
| Total | 2650 | 81 | 10 | 8 | 2678 | 81 | 10 | 9 |
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
![[What makes a vertebrate a vertebrate]](images/ribcage.jpg)
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
Some of the limitations of GENSCAN are discussed here.
Back to the GENSCAN Web site.
Address any comments/questions/suggestions to:
cburge@mit.edu