A novel compression tool for efficient storage of genome resequencing data
When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ∼159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS.
Chromosome number | Varied sequence percentage (%) | Raw file size (MB) | Compressed file size | Compression rate |
---|---|---|---|---|
1 | 0.016 314 | 29.4 | 715.0 B | 43 116.3 |
2 | 0.036 145 | 19.0 | 385.0 B | 51 747.9 |
3 | 0.046 910 | 22.7 | 2.9 KB | 6709.0 |
4 | 0.000 301 | 17.9 | 1.9 KB | 9647.2 |
5 | 0.063 888 | 26.1 | 604.0 B | 45 311.0 |
The whole genome | 0.032 712 | 115.1 | 6.5 KB | 18 132.7 |
- The verified sequence percentage of each chromosome, the size of raw sequence file and compressed file, as well as the compression rate are shown.
http://nar.oxfordjournals.org/content/early/2011/01/25/nar.gkr009.full
没有评论:
发表评论