What is a heteroduplex?

A heteroduplex is a double stranded sequence comprised of two non-complementary strands. During the annealing step of PCR, non-complementary, but highly similar, DNA strands can form a heteroduplex. In other words, heteroduplexes are a byproduct of amplifying different templates in the same reaction.

A heteroduplex is not a PCR chimera, which is defined on Wikipedia as

It occurs when the extension of an amplicon is aborted, and the aborted product functions as a primer in the next PCR cycle. The aborted product anneals to the wrong template and continues to extend, thereby synthesizing a single sequence sourced from two different templates.

What is heteroduplex splitting?

Starting with ccs v6.3.0, --hd-finder activates algorithms to detect heteroduplexes during the HiFi generation. Substitutions and large insertions (>20bp) with a significant strand bias are detected at the subread level. Subreads are aligned to the draft, and a pileup is generated. Divergent substitution sites are identified, and fisher’s exact test is used to determine if a substitution has strand bias. ZMWs labeled as heteroduplex are split, on-the-fly, into single-stranded CCS reads. As a consequence, ccs distinguish between double-stranded (DS) and single-stranded (DS) ZMWs and their consensus reads. Implications:

  • Heteroduplex splitting is non-reversible
  • The BAM output file will have three read groups instead of one
  • Summary logs report double-strand and single-strand metrics
  • ccs_reports.txt file contains two columns, double-strand and single-strand reads
  • --by-strand and –hd-finder are non-equivalent, results can differ for the same ZMW

Additional read groups in BAM

The BAM file contains two different kinds of reads, single-strand and double-strand reads. Single-strand reads follow the by-strand scheme with /fwd and /rev name suffixes and ccs generates up to two single-strand reads per ZMW. Double-strand reads have no special distinguishing factor. Each of the three types of stranded reads have their own read groups. Single-stranded reads have an additional field in the DS tag of the read group. Simplified example

@RG ID:793f140b PL:PACBIO DS:READTYPE=CCS;STRAND=FORWARD  <- single-strand reads /fwd
@RG ID:36fc54d5 PL:PACBIO DS:READTYPE=CCS;STRAND=REVERSE  <- single-strand reads /rev
@RG ID:5d30364d PL:PACBIO DS:READTYPE=CCS                 <- double-strand reads

Summary logs

At the end of each execution, ccs reports for --log-level INFO a summary. This summary contains combined and individual metrics for DS and SS.

-------------------------------------------------
Summary stats abbreviations:
ZMW         - A productive Zero-Mode Waveguide
DS          - Double Strand
SS          - Single Strand
DS-ZMW      - All subreads were used from a single ZMW
SS-ZMW      - ZMW is split into fwd and rev strands,
              each strand is polished individually
DS-Read     - CCS read of a DS-ZMW
SS-Read     - CCS read of one strand of a SS-ZMW
HiFi        - CCS reads with predicted accuracy >=Q20
UMY         - Unique Molecular Yield of all reads passing filters
HiFi Yield  - UMY of >=Q20 DS- and SS-ZMWs, longest read per ZMW
-------------------------------------------------
ZMWs Input    : 53895
ZMWs Written  : 22684
 - DS / SS    : 22644 / 40
UMY           : 413.2 MBases (6.8 GBases/hr)
 - DS / SS    : 412.4 MBases / 733.7 KBases
HiFi Yield    : 413.5 MBases (6.8 GBases/hr)
 - DS / SS    : 412.4 MBases / 1.0 MBases
HiFi Reads    : 22701
 - DS / SS    : 22644 / 57
HiFi Avg Size : 18.2 KBases
HiFi Avg QV   : 30.2

Strand-aware ccs_reports.txt

Typical content of the strand-aware ccs_reports.txt file. Contrary to the default output, this file does not report numbers in ZMWs, but actual DS and SS reads. Accounting in SS ZMWs is not possible, as one strand might fail and the other succeed. The percentage of the Inputs is with respect to the number of ZMWs, all other percentages are with respect to reads in their column.

                           Double-Strand Reads  Single-Strand Reads
Inputs                   :      53590 (99.43%)         609 (0.564%)

Passed                   :      22644 (42.25%)          57 (9.360%)
Failed                   :      30946 (57.75%)         552 (90.64%)

Tandem repeats           :        461 (1.490%)           0 (0.000%)

Exclusive failed counts
Below SNR threshold      :        870 (2.811%)           0 (0.000%)
Median length filter     :          0 (0.000%)           0 (0.000%)
Shortcut filters         :          0 (0.000%)           0 (0.000%)
Lacking full passes      :      26226 (84.75%)           0 (0.000%)
Coverage drops           :         30 (0.097%)           0 (0.000%)
Insufficient draft cov   :         61 (0.197%)         310 (56.16%)
Draft too different      :          0 (0.000%)           0 (0.000%)
Draft generation error   :        173 (0.559%)          54 (9.783%)
Draft above --max-length :          0 (0.000%)           0 (0.000%)
Draft below --min-length :          0 (0.000%)           0 (0.000%)
Reads failed polishing   :          0 (0.000%)           0 (0.000%)
Empty coverage windows   :          3 (0.010%)           0 (0.000%)
CCS did not converge     :          2 (0.006%)           0 (0.000%)
CCS below minimum RQ     :       3581 (11.57%)         188 (34.06%)
Unknown error            :          0 (0.000%)           0 (0.000%)

Can I combine it with HiFi kinetics?

Yes! Check out kinetics FAQ


THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.