An integrated knowledge database dedicated to ncRNAs, especially lncRNAs.

What is NONCODE ?

NONCODE (current version v5.0) is an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs). Now, there are 17 species in NONCODE(human, mouse, cow, rat, chicken, fruitfly, zebrafish, celegans, yeast, Arabidopsis, chimpanzee, gorilla, orangutan, rhesus macaque, opossum platypus and pig). The source of NONCODE includes literature and other public databases. We searched PubMed using key words ‘ncrna’, ‘noncoding’, ‘non-coding’,‘no code’, ‘non-code’, ‘lncrna’ or ‘lincrna. We retrieved the new identified lncRNAs and their annotation from the Supplementary Material or web site of these articles. Together with the newest data from Ensembl , RefSeq, lncRNAdb and GENCODE were processed through a standard pipeline for each species. The pipeline includes six steps:

  1. Format normalization. All the input data were arranged into bed or gtf format based on one assembly version, for example hg38 for human and mm10 for mouse.
  2. Combination. All the normalized data files were combined together using Cuffcompare program in Cufflinks suite. After eliminating redundancy, every new transcript ID and the according resources were extracted.
  3. Filtering protein-coding RNA. We filtered out protein-coding RNA through two ways. Firstly, compare with the coding RNA in RefSeq and Ensembl and leave out the “=” and “c” transcripts. Secondly, compute with Coding-Non-Coding Index (CNCI) program and only keep the RNAs considered as non-coding by CNCI.
  4. Information retrieve. In this step, we designated each transcript a name according to criterion of NONCODEv4 and prepared their the basic information such as location, exon, length, assembly sequence ,source etc.
  5. Advanced annotation. Advanced annotation include expression profiles, predicted function, conservation, disease information etc. Human expression profiles came from 16 tissues of Human BodyMap 2.0 data (ENA archive: ERP000546) and 8 cell lines(GEO accession no. GSE30554), while mouse from six different tissues (ENA archive: ERP000591) . Functions of lncRNA gene were predicted by lnc-GFP, a coding–non-coding co-expression network based global function predictor ncFANs.
  6. Web presentation. NONCODE provides a user-friendly interface.

Now, there are 17 species in NONCODE. All in all, NONCODE tries to present the most complete collection and annotation of non-coding RNA.It not only provides the basic information of lncRNA such as location, strand, exon number, length and sequence, but also the advanced information such as the expression profile, exosome expression profile, conservation info, predicted function and disease relation.

The genome version of each species in current NONCODE version