Using demultiplexed FastQ files as input, performs all necessary steps to end up with a tsv file summarizing the restriction enzyme fragments and the number of UMIs supporting that specific contact with the viewpoint (bait) of interest.
contactsUMI4C( fastq_dir, wk_dir, file_pattern = NULL, bait_seq, bait_pad, res_enz, cut_pos, digested_genome, bowtie_index, threads = 1, numb_reads = 1e+09, rm_tmp = TRUE, min_flen = 20, filter_bp = 1e+07, ref_gen, sel_seqname = NULL )
fastq_dir | Path of the directory containing the FastQ files (compressed or uncompressed). |
---|---|
wk_dir | Working directory where to save the outputs generated by the UMI-4c analysis. |
file_pattern | Character that can be used to filter the files you want
to analyze in the |
bait_seq | Character containing the bait primer sequence. |
bait_pad | Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence). |
res_enz | Character containing the restriction enzyme sequence. |
cut_pos | Numeric indicating the nucleotide position where restriction enzyme cuts (zero-based) (for example, for DpnII is 0). |
digested_genome | Path for the digested genome file generated using the
|
bowtie_index | Path and prefix of the bowtie index to use for the alignment. |
threads | Number of threads to use in the analysis. Default=1. |
numb_reads | Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
rm_tmp | Logical indicating whether to remove temporary files (sam and intermediate bams). TRUE or FALSE. Default=TRUE. |
min_flen | Minimal fragment length to use for selecting the fragments. Default=20 |
filter_bp | Integer indicating the bp upstream and downstream of the viewpoint to select for further analysis. Default=10e6 |
ref_gen | A BSgenome object of the reference genome. |
sel_seqname | A character with the chromosome name to focus the search for the viewpoint sequence. |
This function is a combination of calls to other functions that perform the necessary steps for processing UMI-4C data.
if (interactive()) { path <- downloadUMI4CexampleData() hg19_dpnii <- digestGenome( cut_pos = 0, res_enz = "GATC", name_RE = "DpnII", ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, out_path = file.path(path, "digested_genome") ) raw_dir <- file.path(path, "CIITA", "fastq") contactsUMI4C( fastq_dir = raw_dir, wk_dir = file.path(path, "CIITA"), bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC", cut_pos = 0, digested_genome = hg19_dpnii, bowtie_index = file.path(path, "ref_genome", "ucsc.hg19.chr16"), threads = 1, numb_reads = 1e9, ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, sel_seqname = "chr16" ) unlink(path, recursive=TRUE) }