What BAM tags are generated?
Tag | Type | Description |
---|---|---|
ac | B,i | Detected and missing adapter counts |
ec | f | Effective coverage |
fi | B,C | Double-strand forward IPD (codec V1) |
fn | i | Double-strand forward number of complete passes (zero or more) |
fp | B,C | Double-strand forward PulseWidth (codec V1) |
ip | B,C | Single-strand IPD (codec V1) |
ma | i | Missing adapters bitmask |
np | i | Number of full-length subreads |
pw | B,C | Single-strand PulseWidth (codec V1) |
ri | B,C | Double-strand reverse IPD (codec V1) |
rn | i | Double-strand reverse number of complete passes (zero or more) |
rp | B,C | Double-strand reverse PulseWidth (codec V1) |
rq | f | Predicted average read accuracy |
sn | B,f | Signal-to-noise ratios for each nucleotide |
zm | i | ZMW hole number |
RG | z | Read group |
How does the output BAM file size scale with yield?
For each base, the output BAM file size scales as follows
- 0.5 byte/base for the actual base (4-bit encoding)
- 1 byte/base for the QV
- 1 byte/base for the forward PW
- 1 byte/base for the forward IPD
- 1 byte/base for the reverse PW
- 1 byte/base for the reverse IPD
For a normal ccs run without kinetics, the upper bound is 1.5 bytes/base. If ccs is run with kinetics, the upper bound is 5.5 bytes/base.
Per-read meta information add a fixed amount of 32 bytes per read:
ec
,rq
: float, each 4 bytessn
: float array, 4x4 bytesnp
,zm
: int32_t, 4 byteRG
: string of length 8, 8x1 bytes
The actual output BAM that ccs generates is compressed. Compression is data-dependent and because of that, upper bounds can’t be provided. For a 19kb insert library and 30h movie time, the ccs BAM files scale on average with:
Read types | Kinetics | Options | Bytes/ Base | Bytes/ HiFiBase | Example (GBytes) | Example (GBytes) |
---|---|---|---|---|---|---|
HiFi | None | 0.7 | 0.7 | 100 | 63 | |
HiFi | HiFi | --hifi-kinetics | 3.7 | 3.7 | 528 | 336 |
HiFi + LQ CCS + unpolished | None | --all | 0.55 | 1.1 | 157 | 100 |
HiFi + LQ CCS + unpolished | HiFi | --all --hifi-kinetics | 2.3 | 4.5 | 642 | 409 |
HiFi + LQ CCS + unpolished | HiFi + LQ CCS | --all --all-kinetics | 2.9 | 5.7 | 814 | 518 |
HiFi + LQ CCS + fallback | HiFi + LQ CCS + fallback | --all --all-kinetics --subread-fallback | 3.0 | 5.8 | 828 | 527 |
Legend:
HiFi
- Polished CCS reads with predicted accuracy greater equals Q20, optionally with kineticsLQ CCS
- Polished CCS reads with predicted accuracy below Q20, optionally with kineticsunpolished
- Unpolished consensus sequence with two or fewer passes, no kinetics possiblefallback
- One representative subread for ZMWs, instead of an unpolished consensus sequence, optionally with kinetics
The Sequel IIe system either runs with --all
per default or optionally with --all --all-kinetics --subread-fallback
.