-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-

-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-
An imporant feature of GENSCAN is that, because it is based on a probabilistic model of genomic sequence composition / gene structure, it is able to assign meaningful probabilities to particular events, e.g., the event E that a particular exon is correct. This probability, P(E), is defined as the sum of the probabilities under the model of all possible "parses" (gene structure descriptions) which contain the exact exon E in the correct reading frame. Though this sum is typically far too large to evaluate by exhaustive enumeration, it can be calculated in a reasonable amount of time using an approach called the "forward-backward" procedure (see Rabiner, L., 1989 Proc. IEEE 77, 257-285 for a general discussion of this method or my thesis for a description of the streamlined method used in the context of GENSCAN). The probability of each predicted exon calculated in this fashion is displayed in the second-to-last column of the text output (headed by the letter "P"). Interestingly, such probabilities provide a useful quantitative guide to the likelihood that a given exon is correct. This was demonstrated by partitioning exons predicted in the Burset & Guigó (1996) set of 570 vertebrate gene sequences on the basis of the exon probability and then determining accuracy statistics for each group separately. This data is shown in the table below (see also Burge, C. & Karlin, S. (1997) J. Mol. Biol. 268, 78-94.)
| Probability | Predicted | Accuracy Class | |||
|---|---|---|---|---|---|
| Range | Exons | Exactly Correct | Partially Correct | Overlapping | Wrong |
| 0.00 - 0.50 | 248 | 29.8% | 27.8% | 4.0% | 38.3% |
| 0.50 - 0.75 | 362 | 54.1% | 26.2% | 2.2% | 17.4% |
| 0.75 - 0.90 | 337 | 74.8% | 16.0% | 1.2% | 8.0% |
| 0.90 - 0.95 | 263 | 87.8% | 6.1% | 0.4% | 5.7% |
| 0.95 - 0.99 | 551 | 92.4% | 3.4% | 0.2% | 4.0% |
| 0.99 - 1.00 | 917 | 97.7% | 0.9% | 0.0% | 1.4% |
| Total | 2,678 | 80.6% | 9.7% | 0.9% | 8.8% |
Some implications of the data shown in the table above are as follows.
-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. . ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /| |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-Back to the GENSCAN Web site.