Mouse neural stem cell genomic data

RNA-Seq and ribosome profiling from mouse neural stem cells obtained from Musashi mouse models is available on GEO (Accession: GSE58423).

Cancer datasets

The Cancer Genome Atlas (TCGA)

Tumor annotations

Gene expression and splicing data

Gene expression and splicing data for tumors from TCGA are available below, for each cancer type.

File formats for all cancer types are the same. Assume X below is a cancer type (e.g. BRCA). The file formats are as follows:

  • genes.X.txt: genes counts/expression for X tumors (unnormalized)
  • genes.counts_tmm_norm.X.txt: genes counts/expression for X tumors (TMM-normalized)
  • genes.counts_quantile_norm.X.txt: genes counts/expression for X tumors (Upper quantile normalization)
  • exons.X.txt: exon counts/expression for X tumors
  • junctions.X.txt exon-exon junction counts for X tumors

Files beginning with matched.* contain data only for matched patients, i.e. patients with RNA-Seq data from a tumor sample as well as from same-tissue non-tumor sample. For example, matched.genes.counts_tmm_norm.X.txt is the TMM-normalized gene expression data for matched patients in cancer type X. The file matched_samples_annotation.X.txt contains clinical metadata for the matched sample tumors for cancer X.

TMM and upper quantile normalization was performed with normpy.

Splicing data are in the subdirectory splicing_data/, that has counts for alternatively spliced exon trios (from hg19 genome). Each sample has a directory named by its TCGA barcode, which can be looked up in the Biospecimen Metadata Browser or in the annotation files provided above. The barcode TCGA-A7-A0D9-01A-31R-A056-07, for example, corresponds to a BRCA primary solid tumor.

The GFF annotation for the hg19 alternatively spliced events is here:

Cancer cell lines

Gene expression tables

Sample labels for table are below, using this abbreviated notation:

  • kal indicates Kallioniemi lab dataset, Edgren et. al., Genome Biology (2011)
  • fou indicates Foulkes lab dataset, Ha et. al. BMC Medical Genomics (2011)
  • tho indicates Thompson lab dataset, Sun et. al., PLoS One (2011)
  • Suffix numbers denote replicates, e.g. SKBR3-kal1 and SKBR3-kal2 are replicates

Luminal cell lines:

  • counts_SKBR3_B: SKBR3-kal1
  • counts_SKBR3_kal: SKBR3-kal2
  • counts_MCF7_kal: MCF7-kal
  • counts_BT20_tho: BT20-tho
  • counts_BT474_kal: BT474-kal
  • counts_MCF7_tho: MCF7-tho
  • counts_ZR751_tho: ZR751-tho
  • counts_shGFP_1: MCF7-1 (with shRNA against GFP, replicate 1), Burge lab
  • counts_shGFP_2: MCF7-2 (with shRNA against GFP, replicate 2), Burge lab
  • counts_T47D_tho: T47D-tho
  • counts_BT474_tho: BT474-tho

Basal cell lines:

  • counts_MDAMB231_tho: MDAMB231-tho
  • counts_MDAMB468_tho: MDAMB468-tho
  • counts_SUM149PT_fou: SUM149PT-fou
  • counts_SUM1315O2_fou: SUM1315O2-fou
  • counts_HCC1937_fou: HCC1937-fou
  • counts_HCC3153_fou: HCC3153-fou

Normal-like cell lines and normal breast tissue:

  • counts_MCF10A_tho: MCF10A-tho
  • counts_MCF10A_fou: MCF10A-fou
  • counts_NormBreast_kal: NormBreast-kal (normal breast tissue)

Other cell lines:

  • counts_epithelial: HMLE cells with control empty vector, from Shapiro et. al. PLoS Genetics (2011)
  • counts_mesenchymal: HMLE cells induced with Twist, from Shapiro et. al. PLoS Genetics (2011)
  • counts_KPL4_kal: KPL4-kal