Have you ever run ccs with different cutoffs, e.g. tuning
--min-rq , because out of the fear of missing out on yield? Similar to the CLR instrument mode, in which subreads are accompanied by a scraps file, ccs offers a new mode to never lose a single read due to filtering, without massive run time increase by polishing low-pass productive ZMWs.
Starting with SMRT Link v10.0 and Sequel IIe, ccs v5.0 or newer is able to generate one representative sequence per productive ZMW, irrespective of quality and passes. This ensures no yield loss due to filtering and enables users to have maximum control over their data. Never fear again that SMRT Link or the Sequel IIe HiFi mode filtered precious data.
Attention: If you work with the
reads.bam file directly, be aware that CCS reads of all qualities are present. This file needs to be understood before piping into your typical HiFi application.
The default command-line behavior has not changed; it still generates only HiFi quality reads by default. But the new
--all mode has been set as default when running the Circular Consensensus Sequencing SMRT Link application or selecting the on-instrument Sequel IIe capabilities:
- HiFi Reads with predicted accuracy ≥Q20 (
rq ≥ 0.99)
- Lower-quality but still polished consensus reads with predicted accuracy <Q20 (
rq < 0.99)
- Unpolished consensus reads (
rq = -1)
- Partial or single full-length subreads unaltered (
rq = -1)
If you want to only use HiFi reads, SMRT Link automatically generates additional files for your convenience that only contain HiFi reads:
Following tools can be installed with
conda install -c bioconda tool_name
We provide a simple tool, called
extracthifi to generate a HiFi-only BAM from a
reads.bam file. Usage is:
extracthifi reads.bam extracthifi.bam
bamtools filter -in reads.bam -out hifi_reads.bam -tag "rq":">=0.99"
We strongly advise against filtering by anything than predicted accuracy, BAM tag
rq tag is the best predictor for read quality. Number of passes is not reliable enough and you might discard too much data. This
np tag is an implementation detail that is guaranteed to be present in future ccs versions.