In addition to downloadable files, an API, and the text search, RNAcentral provides a public Postgres database that can be used to query the data using SQL syntax. The database is updated with every RNAcentral release and contains a copy of the data available through the RNAcentral website.
Main database
hh-pgsql-public.ebi.ac.uk
5432
pfmegrnargs
reader
NWDMCE5xdipIjRrp
To connect to the database using command line:
psql postgres://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs
Pro tip: if you don't have psql
installed on your machine, consider using Docker to get started with a pre-configured Postgres image:
docker pull postgres
docker run -it postgres psql postgres://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs
Alternatively, you can use a Postgres client like DBeaver or PgAdmin.
If your computer is behind a firewall, please ensure that outgoing TCP/IP connections to the corresponding ports are allowed.
The following diagram was generated based on the latest version of RNAcentral.
The entire RNAcentral schema contains more than 40 tables, but the following tables are good starting points for exploring the data:
rna
- contains RNA sequences and URS identifiersxref
- contains cross-references to Expert Databasesrnc_database
- contains a list of Expert Databasesrnc_accessions
- contains metadata associated with each cross-referencernc_rna_precomputed
- contains RNA types and descriptions for all sequencesAlthough the VEGA database has been archived and its identifiers are no longer searchable using the RNAcentral text search, you can still query RNAcentral using VEGA identifiers:
SELECT
upi, -- RNAcentral URS identifier
taxid, -- NCBI taxid
ac -- external accession
FROM xref
WHERE ac IN ('OTTHUMT00000106564.1', 'OTTHUMT00000416802.1')
Example output:
URS00000B15DA 9606 OTTHUMT00000106564.1
URS00000A54A6 9606 OTTHUMT00000416802.1
Exporting millions of sequences via the public database can be faster than using the Text Search. Here's an example of how you can explore this service:
Create a file query.sql
:
SELECT
precomputed.id
FROM rnc_rna_precomputed precomputed
JOIN rnc_taxonomy tax
ON
tax.id = precomputed.taxid
WHERE
tax.lineage LIKE 'cellular organisms; Bacteria; %'
AND precomputed.is_active = true -- exclude sequences without active cross-references
AND rna_type = 'rRNA'
Run the following command to execute the query:
docker run -v `pwd`:/rnacentral -it postgres /bin/sh -c 'cd /rnacentral && psql -t -A -f query.sql postgres://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs > ids.txt'
The command will create a file ids.txt
with a list of RNAcentral identifiers.
Download the following RNAcentral FASTA file:
Extract the sequences using seqkit:
seqkit grep -f ids.txt rnacentral_species_specific_ids.fasta.gz > output.fasta
The file output.fasta
will contain the desired subset of RNAcentral sequences in FASTA format.
Requires psycopg2 to connect to Postgres:
pip install psycopg2
The public Postgres database allows anyone to run the RNAcentral website locally and contribute new code to RNAcentral using the instructions available on GitHub.