Measures of predictive accuracy



To calculate accuracy statistics at the nucleotide level, each nucleotide of a test sequence is classified as predicted positive (PP) if it is in a predicted coding region, predicted negative (PN) otherwise, and also as actual positive (AP) or actual negative (AN) according to the sequence annotation. These assignments are then compared to calculate the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). Accuracy is then measured by:

Sensitivity, Sn = TP / AP

Specificity, Sp = TP / PP

and Approximate Correlation, AC, defined as:

AC = ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) / 2 - 1


At the exon level, predicted exons (PE) are compared to annotated exons (AE). True exons (TE) is the number of predicted exons which are exactly identical to an annotated exon (i.e. both endpoints correct). Accuracy is again measured by:

Sensitivity, Sn = TE / AE

Specificity, Sp = TE / PE

The average of Sn and Sp is typically used as an overall measure of accuracy at the exon level in lieu of a correlation measure. Two additional accuracy measures are also calculated at the exon level: Missing Exons (ME), the fraction of annotated exons not overlapped by any predicted exon; and Wrong Exons (WE), the fraction of predicted exons not overlapped by any true exon. Accuracy measures for a set of sequences are calculated by averaging the values obtained for each sequence separately, the average being taken over all sequences for which the measure is defined.

-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X||
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-

Back to the GENSCAN Web site.

Address any comments/questions/suggestions to: cburge@mit.edu