This page includes documentation for datasets from Katz et. al. Musashi proteins are post-transcriptional regulators of the epithelial-luminal cell state, eLife (2014).
Main paper website is available at musashi-genes.org
RNA-Seq and ribosome profiling from mouse neural stem cells obtained from Musashi mouse models is available on GEO (Accession: GSE58423).
Gene expression and splicing data for tumors from TCGA are available below, for each cancer type.
File formats for all cancer types are the same. Assume X below is a cancer type (e.g. BRCA). The file formats are as follows:
Files beginning with matched.* contain data only for matched patients, i.e. patients with RNA-Seq data from a tumor sample as well as from same-tissue non-tumor sample. For example, matched.genes.counts_tmm_norm.X.txt is the TMM-normalized gene expression data for matched patients in cancer type X. The file matched_samples_annotation.X.txt contains clinical metadata for the matched sample tumors for cancer X.
TMM and upper quantile normalization was performed with normpy.
Splicing data are in the subdirectory splicing_data/, that has counts for alternatively spliced exon trios (from hg19 genome). Each sample has a directory named by its TCGA barcode, which can be looked up in the Biospecimen Metadata Browser or in the annotation files provided above. The barcode TCGA-A7-A0D9-01A-31R-A056-07, for example, corresponds to a BRCA primary solid tumor.
The GFF annotation for the hg19 alternatively spliced events is here:
Gene expression tables
Sample labels for table are below, using this abbreviated notation:
Luminal cell lines:
Basal cell lines:
Normal-like cell lines and normal breast tissue:
Other cell lines: