NONCODE (current version v6.0)
is an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs).
Now, there are 39 species in NONCODE including 16 animals and 23 plants.
The source of NONCODE includes literature and other public databases.
We searched PubMed using key words ‘ncrna’, ‘noncoding’, ‘non-coding’,‘no code’, ‘non-code’, ‘lncrna’ or ‘lincrna.
We retrieved the new identified lncRNAs and their annotation from the Supplementary Material or web site of these articles.
Together with the newest data from Ensembl, RefSeq, lncRNAdb and GENCODE were processed through a standard pipeline for each species.
The pipeline includes seven steps:
Format normalization. All input data were processed into bed or gtf formats based on one assembly version. For example, Tair 10 and Tair 9 are two different assembly versions of A.thaliana. All of the related data were converted into the Tair 10 version.
Multi-source data combination. All of the normalized data files were combined using the Cuffcompare program in the Cufflinks suite
Protein-coding RNA filtration. We filtered out protein-coding RNA using two methods. First, all RNAs were compared with the coding RNAs in RefSeq and Ensemble. Second, CNIT (Coding-NonCoding Identifying Tool) was used to filter the RNAs and only the RNAs considered noncoding by CNIT were kept.
General information presentation. Location, exons, length, assembly sequence, source are listed in each transcript.
Expression profiles and functions prediction in plants. Corresponding information in four common plants out of 23 are shown. Their expression profiles were curated from multiple tissues. Detailed data sources were listed in supplementary table 1. Functions for lncRNAs were predicted by co-expression with coding genes.
Conservation analysis at transcript level. Plant lncRNA conservation analysis was conducted with BLAST. The E-value cutoff was e-10. Each transcript in a plant species was blasted against every other transcript in the other 22 plant species.
Web presence. New web pages especially for plants were constructed in NONCODEV6. More annotation information has been updated.
Now, there are 39 species in NONCODE.
All in all, NONCODE tries to present the most complete collection and annotation of non-coding RNA.
It not only provides the basic information of lncRNA such as location, strand, exon number, length and sequence, but also the advanced information such as the expression profile, exosome expression profile, conservation info, predicted function and disease relation.
The genome version of each species in current NONCODE version