Schema for DNase Clusters - DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types)
Database: hg38 Primary Table: wgEncodeRegDnaseClustered Row Count: 2,076,756|
Format description: BED5+ with a count, list of sources, and list of source scores for combined data
|field||example||SQL type ||description |
|bin ||585||smallint(5) unsigned ||Indexing field to speed chromosome range queries.|
|chrom ||chr1||varchar(255) ||Reference sequence chromosome or scaffold|
|chromStart ||9980||int(10) unsigned ||Start position in chromosome|
|chromEnd ||10410||int(10) unsigned ||End position in chromosome|
|name ||8||varchar(255) ||Name of item|
|score ||72||int(10) unsigned ||Display score (0-1000)|
|sourceCount ||8||int(10) unsigned ||Number of sources|
|sourceIds ||88,71,65,9,11,17,66,87,||longblob ||Source ids|
|sourceScores ||72,38,31,29,56,55,38,48,||longblob ||Source scores|
Note: all start coordinates in our database are 0-based, not
1-based. See explanation
DNase Clusters (wgEncodeRegDnaseClustered) Track Description
This track shows clusters of DNaseI hypersensitivity derived from assays in 95 cell types
John Stamatoyannapoulos lab
at the University of Washington from September 2007 to January 2011, as part of the
ENCODE project first production phase.
Regulatory regions in general, and promoters in particular, tend to be DNase-sensitive.
Additional views of this data sites are displayed from the
DNaseI HS track.
The peaks in that track are the basis for the clusters shown here,
which combine data from peaks from the different cell lines.
Please note that track colors for the DNase tracks are based on similiarity of cell types,
while there is different coloring for cell types on the ENCODE hg38
Layered H3K4Me1 track,
Layered H3K4Me3 track, and
Layered H3K27Ac track,
which match the coloring used in their previous versions lifted from the hg19 assembly.
Display Conventions and Configuration
A gray box indicates the extent of the hypersensitive region.
The darkness is proportional to the maximum signal strength observed in any cell line.
The number to the left of the box shows how many cell lines are hypersensitive in the region.
The track can be configured to restrict the display to elements above a specified score
in the range 1-1000 (where score is based on signal strength).
Raw sequence data files were processed by the UCSC ENCODE DNase analysis pipeline (July 2014
specification), diagrammed here:
Credit: Qian Alvin Qin, X. Liu lab
Briefly, sequence files were aligned to the hg38 (GRCh38) genome assembly augmented with 'sponge'
sequence (ref). Multi-mapped reads were removed, as were reads that aligned to 'sponge' or
mitochondiral sequence. Results from all replicates were pooled, and further processed by
the Hotspot program to call peaks.
Peaks of DNaseI hypersensitivity from the ENCODE DNase Analysis Pipeline at UCSC
were assigned normalized scores (by UCSC regClusterMakeTableOfTables) in the range 0-1000 based
signalValue and then clustered on score (by UCSC regCluster) to generate singly-linked clusters.
Additional documentation on the methods used to identify hypersensitive sites are
available from the
DNaseI HS track.
This track is based on sequence data from the University of Washington ENCODE group,
with subsequent processing by UCSC.
For additional credits and references, see the
DNaseI HS track.