TAN Li, SUN Ji-feng. High-Throughput DNA Sequence Data Compression Method Based on Codebook Index Transformation[J]. Acta Electronica Sinica, 2015, 43(5): 1007-1013.
TAN Li, SUN Ji-feng. High-Throughput DNA Sequence Data Compression Method Based on Codebook Index Transformation[J]. Acta Electronica Sinica, 2015, 43(5): 1007-1013. DOI: 10.3969/j.issn.0372-2112.2015.05.026.
A novel high-throughput DNA sequence compression method based on codebook index transformation (CITD) is proposed.In CITD
we used the codebook index transformation (CIT) model
to substitute the traditional represatation of codebook indexes by the quaternary values which are expressed by the four standard base characters
and adopted a simple encoding method to distinguish the replaced and non-replaced substring
and subsequently determined whether need to use the Burrow Wheeler Transformation (BWT) according to the value of information entropy
finally used move to front (MTF) transformation and Huffman entropy coding to compress the data.Experimental results on several sequencing data sets demonstrate better performance of CITD than the high-throughput DNA sequence compression algorithms cited in this paper