Generating a BioMart Other RNA annotations BBMap database¶
This page describes how to generate an annotations database containing “Other RNAs” such as snoRNAs and lncRNAs.
If you do not wish to supply a real BioMart file, simply download and point Prost! at the following fake BioMart file: fake_biomart_file.fa
Download the sequences from BioMart¶
Open http://ensembl.org/biomart in your browser.
Make the following selections:
- Click - CHOOSE DATABASE - and select Ensembl Genes xx where xx is the current version of Ensembl.
- Click - CHOOSE DATASET - and select your species of interest (e.g. Danio rerio genes (GRCz10)).
- Click Filters
- Click the + button next to GENE to expand that box.
- Under Gene Type, select all of the following if present (you may have
to ctrl-click or cmd-click to select multiples at the same time):
- lincRNA
- miRNA
- misc_RNA
- Mt_rRNA
- Mt_tRNA
- rRNA
- snoRNA
- snRNA
- Click Attributes
- Click the Sequences radio button.
- Click the + button next to SEQUENCES to expand that box.
- Click the cDNA sequences radio button.
- Click the + button next to HEADER INFORMATION to expand that box.
- Uncheck all the checkboxes.
- Check the these boxes in the following order:
- Gene Name
- Transcript type
- Transcript stable ID
Optional: Click Count to see how many sequences will be returned.
Click Results to generate the fasta file.
Click Go to download the fasta file containing the other RNAs.
- Move the downloaded file (typically named
mart_export.txt
) somewhere appropriate on your computer (or onto another computer if that is where you’ll be running Prost!). - Rename the downloaded file (typically named
mart_export.txt
) to something more appropriate (e.g.YOUR_SPECIES_biomart_other_rnas.fa
). - Build a BBMap database from the FASTA file (see Building BBMap Databases).
- Update the Prost! configuration file (usually
prost.config
) to reflect the newly created BBMap database, for example:
[AnnotationAlignment3]
type: BiomartOtherRNAAnnotation
name: other
tool: bbmap
db: /path/to/YOUR_SPECIES_biomart_other_rnas
max_3p_mismatches: 0
max_non_3p_mismatches: 0
allow_indels: no