JI Zhen, ZHOU Jia-rui, ZHU Ze-xuan, et al. Bioinformatics Features Based DNA Sequence Data Compression Algorithm[J]. Acta Electronica Sinica, 2011, 39(5): 991-995.
DOI:
JI Zhen, ZHOU Jia-rui, ZHU Ze-xuan, et al. Bioinformatics Features Based DNA Sequence Data Compression Algorithm[J]. Acta Electronica Sinica, 2011, 39(5): 991-995.DOI:
Bioinformatics Features Based DNA Sequence Data Compression Algorithm
A novel bioinformatics features based DNA Sequence data compression algorithm of BioLZMA is proposed in this paper.In BioLZMA
the DNA sequence data is sliced and reformed into 4 clusters according with biological meanings:the coding sequence cluster
the intron cluster
the RNA cluster and the residual cluster.By employing pointed compression strategies in data pre-processing
the clusters are compressed separately with LZMA algorithm.Experimental results demonstrated the better performance of BioLZMA than original DNA compression algorithms on benchmark sequences.Especially on long DNA sequence with significant bioinformatics features
BioLZMA algorithm can achieve higher compression ratio with little computation time.