2011年6月14日星期二

new compression tools for genomic data

A novel compression tool for efficient storage of genome resequencing data


When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ∼159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS.

Table 4.
Performance of GRS in compressing A. thaliana genome of TAIR9 using TAIR8 as the reference
Chromosome number Varied sequence percentage (%) Raw file size (MB) Compressed file size Compression rate
1 0.016 314 29.4 715.0 B 43 116.3
2 0.036 145 19.0 385.0 B 51 747.9
3 0.046 910 22.7 2.9 KB 6709.0
4 0.000 301 17.9 1.9 KB 9647.2
5 0.063 888 26.1 604.0 B 45 311.0
The whole genome 0.032 712 115.1 6.5 KB 18 132.7
  • The verified sequence percentage of each chromosome, the size of raw sequence file and compressed file, as well as the compression rate are shown.


http://nar.oxfordjournals.org/content/early/2011/01/25/nar.gkr009.full

没有评论:

发表评论