What BAM tags are generated?
Tag | Type | Description |
---|---|---|
ac | B,i | Detected and missing adapter counts |
ec | f | Effective coverage |
fi | B,C | Double-strand forward IPD (codec V1) |
ff | i | Fail reads |
fn | i | Double-strand forward number of complete passes (zero or more) |
fp | B,C | Double-strand forward PulseWidth (codec V1) |
ip | B,C | Single-strand IPD (codec V1) |
ma | i | Missing adapters bitmask |
np | i | Number of full-length subreads |
pw | B,C | Single-strand PulseWidth (codec V1) |
ri | B,C | Double-strand reverse IPD (codec V1) |
rn | i | Double-strand reverse number of complete passes (zero or more) |
rp | B,C | Double-strand reverse PulseWidth (codec V1) |
rq | f | Predicted average read accuracy |
sa | B,I | [Run-length encoded per-base coverage by subread alignments in form of |
sm | B,C | Per-base number of aligned matches |
sx | B,C | Per-base number of aligned mismatches |
sn | B,f | Signal-to-noise ratios for each nucleotide |
zm | i | ZMW hole number |
RG | z | Read group |
How does the output BAM file size scale with yield?
For each base, the output BAM file size scales as follows
- 0.5 byte/base for the actual base (4-bit encoding)
- 1 byte/base for the QV
- 1 byte/base for the forward PW
- 1 byte/base for the forward IPD
- 1 byte/base for the reverse PW
- 1 byte/base for the reverse IPD
For a normal ccs run without kinetics, the upper bound is 1.5 bytes/base. If ccs is run with kinetics, the upper bound is 5.5 bytes/base.
Per-read meta information add a fixed amount of 32 bytes per read:
ec
,rq
: float, each 4 bytessn
: float array, 4x4 bytesnp
,zm
: int32_t, 4 byteRG
: string of length 8, 8x1 bytes
The actual output BAM that ccs generates is compressed. Compression is data-dependent and because of that, upper bounds can’t be provided.