Prepare the FastQ files for the further analysis by selecting reads with bait and adding the respective UMI identifier for each read in its header.
prepUMI4C( fastq_dir, wk_dir, file_pattern = NULL, bait_seq, bait_pad, res_enz, numb_reads = 1e+09 )
| fastq_dir | Path of the directory containing the FastQ files (compressed or uncompressed). |
|---|---|
| wk_dir | Working directory where to save the outputs generated by the UMI-4c analysis. |
| file_pattern | Character that can be used to filter the files you want
to analyze in the |
| bait_seq | Character containing the bait primer sequence. |
| bait_pad | Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence). |
| res_enz | Character containing the restriction enzyme sequence. |
| numb_reads | Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
Creates a compressed FASTQ file in wk_dir/prep named
basename(fastq)).fq.gz, containing the filtered reads with the UMI
sequence in the header. A log file with the statistics is also generated
in wk_dir/logs named umi4c_stats.txt.
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) raw_dir <- file.path(path, "CIITA", "fastq") prepUMI4C( fastq_dir = raw_dir, wk_dir = file.path(path, "CIITA"), bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC" ) }