Annotating sets of genomic intervals with genomic annotations such as chromHMM

Datetime:2016-08-23 01:45:50          Topic:          Share

Genomation is an R package to summarize, annotate and visualize genomic intervals. It contains a collection of tools for visualizing and analyzing genome-wide data sets, i.e. RNA-seq, bisulfite sequencing or chromatin-immunoprecipitation followed by sequencing (ChIP-seq) data.

Recently we added new features to genomation The new functionalities are available in the latest version of genomation that can be found on its github website.

# install the package from github
library(devtools)
install_github("BIMSBbioinfo/genomation",build_vignettes=FALSE)

This demo shows the new annotation functions in genomation. The functions can be used to annotate target regions or a list of target regions with a given set of genomic features. The genomic features to be used should be in named GRangesList format.

Get data to R

We will get p300, SP1 and Nanog peaks,and chromHMM annotations from ENCODE. We wil use the H1-Esc cells. The aim is to annotate the peaks with chromHMM annotations.

p300.url="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsHaibH1hescP300V0416102UniPk.narrowPeak.gz"

Nanog.url="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsHaibH1hescNanogsc33759V0416102UniPk.narrowPeak.gz"

SP1.url="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsHaibH1hescSp1Pcr1xUniPk.narrowPeak.gz"

chrHMM.url="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmH1hescHMM.bed.gz"
#http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgSegmentation/wgEncodeAwgSegmentationChromhmmH1hesc.bed.gz"

# get chromHMM annotation and make a list out of it
library(genomation)
chrHMM=readBed(chrHMM.url)
chrHMM.list=GenomicRanges::split(chrHMM, chrHMM$name,drop=TRUE)

# get peaks
p300=readBed(p300.url)
SP1=readBed(SP1.url)
NANOG=readBed(Nanog.url)

The new functions

here are the new functions annotateWithFeatures and heatTargetAnnotation . These functions are available as of version 1.5.6. There were similar functions within the package before but these are more generalized versions of the old ones.

annotateWithFeatures() will calculate the percentage of overlaps in a given GRanges object with a GRangesList object where each element correspods to a different category of genomic features such as promoters, exons and introns.

heatTargetAnnotation() will plot a heatmap of annotations returned from annotateWithFeatures() or can return the matrix that is used the create the heatmap.

Annotate peaks with chromHMM segments

We can annote a given peak set like this:

peak2ann=annotateWithFeatures(p300,chrHMM.list)
peak2ann
## summary of target set annotation with feature annotation:
## Rows in target set: 8934
## ----------------------------
## percentage of target elements overlapping with features:
## 1_Active_Promoter 10_Txn_Elongation       11_Weak_Txn      12_Repressed 
##             28.16              0.07              2.10              1.33 
## 13_Heterochrom/lo 14_Repetitive/CNV 15_Repetitive/CNV   2_Weak_Promoter 
##              2.74              0.08              0.18             23.86 
## 3_Poised_Promoter 4_Strong_Enhancer 5_Strong_Enhancer   6_Weak_Enhancer 
##              8.69             11.91             16.33             32.65 
##   7_Weak_Enhancer       8_Insulator  9_Txn_Transition 
##              8.03              5.72              1.00
##
## percentage of feature elements overlapping with target:
## 1_Active_Promoter 10_Txn_Elongation       11_Weak_Txn      12_Repressed 
##             19.81              0.04              0.16              0.68 
## 13_Heterochrom/lo 14_Repetitive/CNV 15_Repetitive/CNV   2_Weak_Promoter 
##              0.27              0.19              0.57              6.76 
## 3_Poised_Promoter 4_Strong_Enhancer 5_Strong_Enhancer   6_Weak_Enhancer 
##              6.45             20.36             11.49              3.46 
##   7_Weak_Enhancer       8_Insulator  9_Txn_Transition 
##              0.53              0.84              0.71
##

We can annotate the GRangesList of different peak sets like this:

peak.list=GenomicRanges::GRangesList(p300=p300,SP1=SP1,NANOG=NANOG)
peak2ann.l=annotateWithFeatures(peak.list,chrHMM.list)
## Working on: p300
## Working on: SP1
## Working on: NANOG

Make a heatmap of percentage of peaks overlapping with different chromHMM annotations

We can make a heatmap using heatTargetAnnotation() function. The function also returns a ggplot2 object, so it can be further manipulated via ggplot2.

heatTargetAnnotation(peak2ann.l)

We can also get the matrix that is used to make the heatmap and use it in other heatmap functions.

mat=heatTargetAnnotation(peak2ann.l,plot=FALSE)
library(gplots)
heatmap.2(mat,col="topo.colors",cellnote=round(mat),
          notecol="black",trace="none",cexCol=0.5,cexRow=0.8)

sessionInfo()

sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.5 (El Capitan)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] devtools_1.11.1  gplots_3.0.1     genomation_1.5.6
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.6                formatR_1.4               
##  [3] GenomeInfoDb_1.8.3         plyr_1.8.4                
##  [5] XVector_0.12.1             bitops_1.0-6              
##  [7] tools_3.3.0                zlibbioc_1.18.0           
##  [9] digest_0.6.10              memoise_1.0.0             
## [11] tibble_1.1                 evaluate_0.9              
## [13] gtable_0.2.0               BSgenome_1.40.1           
## [15] gridBase_0.4-7             yaml_2.1.13               
## [17] parallel_3.3.0             withr_1.0.1               
## [19] rtracklayer_1.32.2         stringr_1.0.0             
## [21] knitr_1.13.1               caTools_1.17.1            
## [23] gtools_3.5.0               Biostrings_2.40.2         
## [25] S4Vectors_0.10.2           IRanges_2.6.1             
## [27] stats4_3.3.0               Biobase_2.32.0            
## [29] data.table_1.9.6           impute_1.46.0             
## [31] plotrix_3.6-3              XML_3.98-1.4              
## [33] BiocParallel_1.6.5         seqPattern_1.4.0          
## [35] rmarkdown_0.9.6            gdata_2.17.0              
## [37] reshape2_1.4.1             readr_1.0.0               
## [39] ggplot2_2.1.0              magrittr_1.5              
## [41] codetools_0.2-14           matrixStats_0.50.2        
## [43] Rsamtools_1.24.0           scales_0.4.0              
## [45] htmltools_0.3.5            BiocGenerics_0.18.0       
## [47] GenomicRanges_1.24.2       GenomicAlignments_1.8.4   
## [49] assertthat_0.1             SummarizedExperiment_1.2.3
## [51] colorspace_1.2-6           labeling_0.3              
## [53] KernSmooth_2.23-15         stringi_1.1.1             
## [55] RCurl_1.95-4.8             munsell_0.4.3             
## [57] chron_2.3-47