NCBI RefSeq Track Settings
 
RefSeq gene predictions from NCBI   (All Genes and Gene Predictions tracks)

Maximum display mode:       Reset to defaults

Color track by codons: Help on codon coloring

Show codon numbering:

Select views (help):
NCBI RefSeq       UCSC RefSeq      
List subtracks: only selected/visible    all    ()  
 
dense
 RefSeq All  NCBI RefSeq genes, curated and predicted sets (NM_*, XM_*, NR_*, XR_*, and YP_*)   schema 
 
dense
 RefSeq Curated  NCBI RefSeq genes, curated subset (NM_*, NR_*, and YP_*)   schema 
 
dense
 RefSeq Predicted  NCBI RefSeq genes, predicted subset (XM_* and XR_*)   schema 
 
dense
 RefSeq Other  NCBI RefSeq other annotations (not NM_*, NR_*, XM_*, XR_*, or YP_*)   schema 
 
dense
 RefSeq Alignments  RefSeq Alignments of RNAs   schema 
 
dense
 UCSC RefSeq  UCSC annotations of RefSeq RNAs (NM_* and NR_*)   schema 
    

Description

The NCBI RefSeq Genes composite track shows human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use coordinates provided by RefSeq, except for the UCSC RefSeq, which UCSC produces by realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences between the annotation coordinates provided by UCSC and NCBI. See the Methods section for more details about how the different tracks were created.

Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records.

Display Conventions and Configuration

This track is a multi-view composite track that contains differing data set views. Instructions for configuring multi-view tracks are here. To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to hide.

The views available for this track include:
RefSeq annotations and alignments
  • RefSeq All – all curated and predicted annotations provided by RefSeq.
  • RefSeq Curated – subset of RefSeq All that includes only those annotations whose accessions begin with NM, NR, or YP.
  • RefSeq Predicted – subset of RefSeq All that includes those annotations whose accessions begin with XM or XR.
  • RefSeq Other – all other annotations produced by the RefSeq group that do not fit the requirements for inclusion in the RefSeq Curated or the RefSeq Predicted tracks.
  • RefSeq Alignments – alignments of RefSeq RNAs to the human genome provided by the RefSeq group.
UCSC annotations
  • UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the human genome. This track was previously known as the "RefSeq Genes" track.

The RefSeq All, RefSeq Curated, RefSeq Predicted and UCSC RefSeq tracks follow the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), or reviewed (dark).

The RefSeq Alignments track follows the display conventions for PSL tracks.

The item labels and codon display properties for features within this track can be configured through the controls at the top of the track description page. Click the view name (NCBI RefSeq or UCSC RefSeq) to globally modify the settings for all subtracks in the view. To adjust the settings for an individual subtrack, click the wrench icon next to the track name in the subtrack list (available only for views containing more than one track).

  • Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name or OMIM identifier instead of the gene name, show all or a subset of these labels including the gene name, OMIM identifier and accession names, or turn off the label completely.
  • Codon coloring: This track has an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page.

Methods

Tracks contained in the RefSeq annotation and RefSeq RNA alignment views were created at UCSC using data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and converted to the genePred and PSL table formats for display in the Genome Browser. Information about the NCBI annotation pipeline can be found here.

The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks. RefSeq RNAs were aligned against the human genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept.

Data Access

The raw data for these tracks can be accessed in multiple ways. It can be explored interactively using the Table Browser or Data Integrator. The tables can also be accessed programmatically through our public MySQL server or downloaded from our downloads server for local processing.

The data in the RefSeq Other track is organized in a bigBed file format; more information about accessing the information in this bigBed file can be found below. The other subtracks are associated with database tables as follows:

genePred format:
  • RefSeq All - ncbiRefSeq
  • RefSeq Curated - ncbiRefSeqCurated
  • RefSeq Predicted - ncbiRefSeqPredicted
  • UCSC RefSeq - refGene
PSL format:
  • RefSeq Alignments - ncbiRefSeqPsl

The first column of each of these tables is "bin". This column is designed to speed up access for display in the Genome Browser, but can be safely ignored in downstream analysis. You can read more about the bin indexing system here.

The annotations in the RefSeq Other track are stored in a bigBed file, ncbiRefSeqOther.bb, that can be obtained from our downloads server here. Individual regions or the whole set of genome-wide annotations can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system from the utilities directory linked above. For example, to extract only annotations in a given region, you could use the following command:

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncbiRefSeqOther.bb -chrom=chr16 -start=34990190 -end=36727467 stdout

The genePred format tracks can also be downloaded in GTF format using the genePredToGtf utility, available from the utilities directory on the UCSC downloads server. The utility can be run from the command line like so:

genePredToGtf hg38 ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf

Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access section.

A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, and RefSeq Predicted tracks can be found on our downloads server here.

Please refer to our mailing list archives for questions.

Credits

This track was produced at UCSC from data generated by scientists worldwide and curated by the NCBI RefSeq project.

References

Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518

Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018

Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979