See the HiC-Pro Utilities which is baed on the split unix command. For more information, see man split
Simply put the full path of your annotations in the configuration files. By default HiC-Pro will check if the file exists, and if not, will look for it inn its annotation folder.
HiC-Pro requires two annotation files.
The chromosomes size are usually available through annotation website, such as the UCSC Genome Browser:
Another way to generate this file is, for instance, to use the R environment.
###
## How to generate chromosome size files ?
###
require(BSgenome.Hsapiens.UCSC.hg19)
human_chr <- seqlevels(BSgenome.Hsapiens.UCSC.hg19)[1:25]
chrom.size <- seqlengths(BSgenome.Hsapiens.UCSC.hg19)[human_chr]
write.table(chrom_size, file="chrom_hg19.sizes", quote=FALSE, col.names=FALSE, sep="\t")
###
## How to generate restriction fragment files with HiTC ?
###
require(HiTC)
require(rtracklayer)
require(BSgenome.Hsapiens.UCSC.hg19)
human_chr <- seqlevels(BSgenome.Hsapiens.UCSC.hg19)[1:25]
resFrag <- getRestrictionFragmentsPerChromosome(resSite="AAGCTT", chromosomes=human_chr, overhangs5=1, genomePack="BSgenome.Hsapiens.UCSC.hg19")
allRF <- do.call("c",resFrag)
names(allRF) <- unlist(sapply(resFrag, function(x){paste0("HIC_", seqlevels(x), "_", 1:length(x))}))
export(allRF, format="bed", con="HindIII_resfrag_hg19.bed")
Th HiC-Pro pipeline is divided into two main steps. The first part of the pipeline is ‘fastq’ based, meaning that the same anlaysis will be performed for all fastq files. This part can be easily parallelized per fastq, with at the end, a list of valid interactions per fastq file. The second step of the pipeline is ‘sample’ based. All lists of valid interactions from the same sample are merged in order to build and normalize the maps. At that time, this second step is not time consuming, and we do not parallelize it although a per sample parallelization migth be a good idea. So, because these two steps as either ‘fastq’ based or ‘sample’ based, we need to separate them during the parallele processing.
No. HiC-Pro is only based on the bowtie2 mapper. However, note that HiC-Pro can be run from aligned data. In this case, the input path (-i) must be a BAM folder, and the analysis has to be run step-by-step.
The allele specific mode of HiC-Pro is based on a N-masked genome. Meaning that all SNPs information which can be use to distinguish parental haplotypes have to be masked. This masking can be performed in 3 steps: 1. Extract relevant SNPs information. See the extract_snps.py utility for Mouse Sanger data. For Human data, you can use phasing data, or SNPs information available from public ressources, as the Illumina Platinum Project, the 1K Genome Project or the GATK resource bundle. 2. Mask the fasta genome. To do so, simply use the bedtools maskfasta utility. 3. Then, create your bowtie2 indexes from the masked fasta file.
The matrix format is a standard sparse triplet format which can easoly be loaded in R or matlab environment. For instance, the matrix can be easily loaded in the R environment using the HiTC Bioconductor package.
require(HiTC)
## Load Hi-C data
x<-importC("mydata.matrix", xgi.bed="mydata_abs.bed")
show(x)
## Plot X intra-chromosomal map
mapC(HTClist(x$chrXchrX), trim.range=.9)
The maps can be plotted into througth R environment. See the HiTC Bioconductor package. and the previous question. The HiC-pro results are also compatible with the HiCPlotter software (Akdemir et al. 2015). The source of HiCPlotter are available on github. Here is a small example of how to use HiCPlotter.
## Plot the genome-wide map at 1Mb resolution
python HiCPlotter.py -f hic_results/matrix/sample1/iced/1000000/sample1_1000000_iced.matrix -o Examplegw -r 1000000 -tri 1 -bed hic_results/matrix/sample1/raw/1000000/sample1_1000000_ord.bed -n hES -wg 1 -chr chrX
## Plot the chrX at 150Kb resolution
python HiCPlotter.py -f hic_results/matrix/sample1/iced/150000/sample1_150000_iced.matrix -o Exemple -r 150000 -tri 1 -bed hic_results/matrix/sample1/raw/150000/sample1_150000_ord.bed -n Test -chr chrX -ptr 1
Since version 2.7.6 HiC-Pro is compatible with the Juicebox viewer. See the hicpro2juicebox utility to generate Juicebox input file from the list of valid interactions.