FTP Archive

The FTP Archive facilitates downloading large volumes of data produced/used by RNAcentral. Being able to download these files may be useful when doing some of your own processing.

The objects in the FTP archive are produced during the release of RNAcentral, and as such they are updated with each release. The archive stores the data back to version 1.0-beta (look in the releases folder).

Most objects stored here are compressed with gzip compression; these files end with a .gz suffix. To decompress them, you can use a command like gzip -d <file path>. The file formats used are documented on the links provided in the table below.

Objects available

Name	Description	Format	Link
Database files	A dump of the postgres database. For old releases this will be removed.	pg_dump	pg_dump
Genome Coordinates	Coordinates of RNAs in RNAcentral in each model organism, as annotated by expert databases	BED GFF3	BED, GFF3
GO annotations	Mappings of RNAcentral entries to Gene Ontology terms	TSV	rnacentral_rfam_annotations.tsv.gz
GPI	Gene product information for selected rRNAs	GPI	rnacentral.gpi.gz
ID mapping	Mapping of RNAcentral IDs to expert database IDs. Also available per database	TSV	id_mapping.tsv.gz, per database
json	JSON files containing RNAcentral IDs and their cross-reference to ensembl. Each file contains 10,000 sequences	JSON	json
md5	RNAcentral ID mapped to MD5 sum of each sequence	TSV	md5.tsv.gz
rfam	RNAcentral IDs with their associated Rfam annotations.	TSV	rfam_annotations.tsv.gz
sequences - active	RNAcentral IDs with their corresponding sequences. Active sequences are present in at least one expert database.	FASTA	rnacentral_active.fasta.gz
sequences - inactive	RNAcentral IDs with their corresponding sequences. Inactive sequences are not currently present within any expert database	FASTA	rnacentral_inactive.fasta.gz
sequences - species specific	RNAcentral species specific URS mapped to sequence	FASTA	rnacentral_species_specific_ids.fasta.gz

Most directories contain readme files that explain their contents further.

Previous releases are also available at releases and largely contain the same objects, though obviously the database has evolved over time.

Directory structure

The structure of the FTP archive is shown below.

rnacentral
|
+- current_release
|   |
|   +- database_files
|   |   |
|   |   +- pg_dump.sql.gz
|   |
|   +- genome_coordinates
|   |   |
|   |   +- bed
|   |   |   |
|   |   |   +- one gzip compressed BED file per model organism
|   |   |
|   |   +- gff3
|   |   |   |
|   |   |   +- one gzip compressed GFF3 file per model organism
|   |   |
|   |   +- readme.txt
|   |
|   +- go_annotations
|   |   |
|   |   +- rnacentral_rfam_annotations.tsv.gz
|   |
|   +- gpi
|   |   |
|   |   +- rnacentral.gpi
|   |   |
|   |   +- rnacentral.gpi.gz
|   |
|   +- id_mapping
|   |   |
|   |   +- database_mappings
|   |   |   |
|   |   |   +- one uncompressed tsv file per expert database
|   |   |
|   |   +- id_mapping.tsz.gz
|   |
|   +- json
|   |   |
|   |   +- JSON files
|   |
|   +- md5
|   |   |
|   |   +- md5.tsv.gz
|   |
|   +- rfam
|   |   |
|   |   +- rfam_annotations.tsv.gz
|   |
|   +- sequences
|       |
|       +- by_database
|       |   |
|       |   +- one uncompressed fasta file per expert database
|       |
|       +- rnacentral_active.fasta.gz
|       |
|       +- rnacentral_inactive.fasta.gz
|       |
|       +- rnacentral_species_specific_ids.fasta.gz
|
+- releases
    |
    +- Archive of releases back to 1.0beta (2014)

Improve this page

FTP Archive

Objects available

Directory structure

Help topics