This is the GENOA file server at MIT, providing access to genome alignments that detect loci and pertinent alternative transcript structures of genes in genomic sequences for the human genome.

GENOA Related publications and online supplementary material:

D. Holste, C.B. Burge, et al. The making of mRNAs: computationally dissecting the alternative splicing of precursors by using the Hollywood database. In preparation. World-wide web Supplementary material

D. Holste, G.Huo, V.Tung, and C.B. Burge. Hollywood: a comparative genomics relational database of alternative splicing. In preparation. World-wide web Supplementary material

E. van Nostrand, D. Holste, Burge CB. Orthology-based characterization of human intron retention. In preparation. World-wide web

G.W. Yeo, E. van Nostrand, D. Holste, Poggio T, Burge CB. Identification and analysis of alternative splicing events conserved in human and mouse. Proc Natl Acad Sci USA 102(8):2850 (2005). World-wide web Supplementary material

G. Yeo*, D. Holste*, G. Kreimann, and C.B. Burge. Variation in alternative splicing across human tissues. Genome Biol 5(10):R74 (2004). World-wide web Supplementary material

W.G. Fairbrother, D.Holste, C.B.Burge, and P.A. Sharp. Single nucleotide polymorphism-based validation of exonic splicing enhancers. PLoS Biol 2(9):E268 (2004). World-wide web Supplementary material

Genome Annotation (GENOA) pipeline
Identification of pre-mRNA alternative splice forms
GENOA aligns spliced cDNA and EST sequences to the human genome and computationally identifies for each loci constitutive and alternative exons (see supportive schematic plot). To this end, GENOA uses BLASTN to detect significant blocks of identity between repeat-masked cDNAs (rm-cDNAs) and genomic DNA, and then aligns cDNAs to the genomic loci identified by BLASTN using the spliced alignment algorithm MRNAVSGEN. MRNAVSGEN is similar in concept to SIM4, but was developed specifically to align high quality cDNAs rather than ESTs and thus requires higher alignment quality (at least 93% identity) and consensus terminal dinucleotides at the ends of all introns. ESTs were aligned using SIM4 to those genomic regions which had significant BLASTN to rm-cDNA and aligned cDNAs. Again, stringent alignment criteria were imposed: (1) ESTs were required to overlap cDNAs (so all of the genes studied were supported by at least one cDNA:genomic alignment); (2) the first and last aligned segments of ESTs were required to be at least 30 nucleotides in length, with 90% sequence identity; and (3) the entire ESTs alignment was required to extend over at least 90% of the length of the EST with at least 90% sequence identity.

[ supportive schematic plot | data | webpage | information ]
An internal exon is identified as a skipped exons (SE) if it was included and skipped in one or more transcripts, and if the boundaries of both 5' and 3' flanking exons were shared in the transcripts that included and skipped that exon (see supportive schematic plot). Similarly, an internal exon was identified as alternative 3' splice site (ss) exon (A3E) or alternative 5'ss exon (A5E), if that exon was altered in another transcript at the corresponding 3'ss (5'ss). The exon core sequence of an A3E (A5E) is defined as the shortest exonic sequence common to transcripts used to infer the A3E (A5E) event, and the exon extension sequence of an A3E (A5E) is the exonic sequence added to the core by the alternative 3'ss (5'ss).

[ ] [ supportive schematic plot ]

Please address comments/questions/suggestions regarding this webpage
to Dirk Holste or Chris Burge

Copyright © Chris Burge Lab, MIT, 2005