File compression

File compression is important to increase storage density of genomes per hard drive and reduce time to transfer files. We offer two approaches to significantly reduce file size.

QV binning

Compression of per-base quality values (QVs) is an effective method to reduce file size. We adopt per-base QV binning in ccs and can achieve up to 40% reduction in BAM file size with similar SNV and InDel detection performance.

How it works: QV compression is performed on base QVs after read quality rq has been computed. For each consensus read, per-base QVs are assigned to seven bins and fixed average QVs are assigned to each bin

QV bin Mean QV ASCII
[ 0, 6 ] Q3 $
[ 7, 13 ] Q10 +
[ 14, 19 ] Q17 2
[ 20, 24 ] Q22 7
[ 25, 29 ] Q27 <
[ 30, 39 ] Q35 D
[ 40, 93 ] Q40 I

Sorting by sequence similarity

In addition to QV filtering, sorting unaligned HiFi BAM files by sequence similarity can reduce file size by up to an additional 30% using

samtools sort -M -K 20 hifi_reads.bam -o hifi_reads.sorted.bam

THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.