GENSCAN Performance Data


For Burset/Guigó Set of Vertebrate Genes



-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-


[The thinking man's vertebrate]


-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-

GENSCAN was tested on the set of 570 vertebrate gene sequences constructed by Burset & Guigó (1996) as a standard for comparison of gene finding methods. The table below lists accuracy statistics for GENSCAN and for other gene prediction programs which do not use protein sequence homology information. Statistics for all other programs are from Table 1 of Burset, M. & Guigó, R. (1996) Genomics 34, 353-367.


Accuracy per nucleotide Accuracy per exon
MethodSn Sp AC Sn Sp (Sn+Sp)/2 ME WE
GENSCAN 0.93 0.93 0.91 0.78 0.81 0.80 0.09 0.05
FGENEH 0.77 0.85 0.78 0.61 0.61 0.61 0.15 0.11
GeneID 0.63 0.81 0.67 0.44 0.45 0.45 0.28 0.24
GeneParser2 0.66 0.79 0.66 0.35 0.39 0.37 0.29 0.17
GenLang 0.72 0.75 0.69 0.50 0.49 0.50 0.21 0.21
GRAILII 0.72 0.84 0.75 0.36 0.41 0.38 0.25 0.10
SORFIND 0.71 0.85 0.73 0.42 0.47 0.45 0.24 0.14
Xpound 0.61 0.82 0.68 0.15 0.17 0.16 0.32 0.13


Accuracy statistics for gene prediction programs are described here.


-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-

Accuracy vs C+G% Content and Vertebrate Group


Performance does not appear to vary a great deal as a function of C+G% content or for different subgroups of vertebrates. GENSCAN accuracy statistics for distinct subsets of the Burset/Guigó test set are shown below.

Accuracy per bp Accuracy per exon
SubsetNo. SeqSn Sp AC Sn Sp (Sn+Sp)/2 ME WE
C+G < 40% 86 0.90 0.95 0.90 0.78 0.87 0.84 0.14 0.05
C+G 40 - 50% 220 0.94 0.92 0.91 0.80 0.82 0.82 0.08 0.05
C+G 50 - 60% 208 0.93 0.93 0.90 0.75 0.77 0.77 0.08 0.05
C+G > 60% 56 0.97 0.89 0.90 0.76 0.77 0.76 0.07 0.08
Primates 237 0.96 0.94 0.93 0.81 0.82 0.82 0.07 0.05
Rodents 191 0.90 0.93 0.89 0.75 0.80 0.78 0.11 0.05
Non-mam. vert. 72 0.93 0.93 0.90 0.81 0.85 0.84 0.11 0.06

Legend:

The Burset/Guigó test set was partitioned according to (1) the C+G% content of the GenBank sequence and (2) the organism of origin. The second column lists the number of sequences in each subset. The Primate group consists primarily of human sequences; the Rodent group, mostly of mouse and rat sequences; Non-mam. vert. is a small but diverse group of non-mammalian vertebrates: 22 fish, 17 amphibian, 5 reptilian and 28 avian sequences.
 
-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-
 

Accuracy as a Function of Exon Length


The accuracy of GENSCAN is somewhat sensitive to exon length. This dependence is summarized in the table below, which shows accuracy statistics for exons from the Burset/Guigó test set grouped according to length.

LengthAnnotated exonsPredicted exons
range (bp)No.%Exact%Part%MissNo.%Exact%Part%Wrong
<= 24893885244771111
25 - 49 163 58 15 25 124 76 6 18
50 - 74 248 70 12 16 204 85 9 6
75 - 99 382 85 8 6 389 84 6 10
100 - 124 351 84 9 7 366 81 8 11
125 - 149 425 88 8 4 460 81 10 7
150 - 174 261 88 9 2 283 81 11 7
175 - 199 167 91 7 2 188 81 12 7
200 - 299 353 90 8 1 390 82 8 8
>= 300 211 66 19 1 204 69 20 10
Total 2650 81 10 8 2678 81 10 9


-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-


[What makes a vertebrate a vertebrate]


-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-

Some of the limitations of GENSCAN are discussed here.

Back to the GENSCAN Web site.

Address any comments/questions/suggestions to: cburge@mit.edu



The graphic above came from the POV-Ray archive.