Prepare the FastQ files for the further analysis by selecting reads with bait and adding the respective UMI identifier for each read in its header.
prepUMI4C( fastq_dir, wk_dir, file_pattern = NULL, bait_seq, bait_pad, res_enz, numb_reads = 1e+09 )
fastq_dir | Path of the directory containing the FastQ files (compressed or uncompressed). |
---|---|
wk_dir | Working directory where to save the outputs generated by the UMI-4c analysis. |
file_pattern | Character that can be used to filter the files you want
to analyze in the |
bait_seq | Character containing the bait primer sequence. |
bait_pad | Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence). |
res_enz | Character containing the restriction enzyme sequence. |
numb_reads | Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
Creates a compressed FASTQ file in wk_dir/prep
named
basename(fastq)).fq.gz
, containing the filtered reads with the UMI
sequence in the header. A log file with the statistics is also generated
in wk_dir/logs
named umi4c_stats.txt
.
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) raw_dir <- file.path(path, "CIITA", "fastq") prepUMI4C( fastq_dir = raw_dir, wk_dir = file.path(path, "CIITA"), bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC" ) }