基于生物信息学特征的DNA序列数据压缩算法

纪震; 周家锐; 朱泽轩; Q H Wu

您当前的位置：

首页 >

文章列表页 >

基于生物信息学特征的DNA序列数据压缩算法

学术论文 | 更新时间：2025-07-16

- 基于生物信息学特征的DNA序列数据压缩算法
- Bioinformatics Features Based DNA Sequence Data Compression Algorithm
- 电子学报 2011年39卷第5期页码：991-995
- 作者机构：
  
  1. 深圳大学计算机与软件学院,广东,深圳,518060
  2. 浙江大学生物医学工程与仪器科学院,浙江,杭州,310027
  3. 利物浦大学电气电子工程系,利物浦,L69 3GJ,UK
  4. 深圳大学计算机与软件学院广东深圳,518060
  5. 浙江大学生物医学工程与仪器科学院浙江杭州,310027
  6. 利物浦大学电气电子工程系利物浦UK,L69 3GJ
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.60872125);霍英东教育基金会高等院校青年教师基金基础性研究课题;深圳市基础研究项目 (杰青奖);广东省自然科学基金(2021A1515012233)
- DOI：
  中图分类号： TP391
- 纸质出版：2011
- 稿件说明：
移动端阅览
纪震, 周家锐, 朱泽轩, 等. 基于生物信息学特征的DNA序列数据压缩算法[J]. 电子学报, 2011,39(5):991-995.

JI Zhen, ZHOU Jia-rui, ZHU Ze-xuan, et al. Bioinformatics Features Based DNA Sequence Data Compression Algorithm[J]. Acta Electronica Sinica, 2011, 39(5): 991-995.
纪震, 周家锐, 朱泽轩, 等. 基于生物信息学特征的DNA序列数据压缩算法[J]. 电子学报, 2011,39(5):991-995. DOI：

JI Zhen, ZHOU Jia-rui, ZHU Ze-xuan, et al. Bioinformatics Features Based DNA Sequence Data Compression Algorithm[J]. Acta Electronica Sinica, 2011, 39(5): 991-995. DOI：

摘要

本文通过将生物学特征和生物学含义引入DNA序列数据的压缩处理中

提出了基于生物信息学特征的BioLZMA压缩算法.在BioLZMA算法中

DNA序列根据组成部分生物学含义的不同切分重组为四个集合:编码序列CDS集合、内含子序列集合、RNA序列集合以及剩余序列的集合.根据各集合中序列的具体生物学特征分别使用针对性的压缩策略进行预处理

并通过LZMA算法进行压缩编码.实验结果表明

BioLZMA算法在基准测试序列上的压缩性能优于原有的DNA序列压缩方法.特别是对于生物信息学特征清晰的长序列

算法能够在较短的时间内获得较高的压缩率.

Abstract

A novel bioinformatics features based DNA Sequence data compression algorithm of BioLZMA is proposed in this paper.In BioLZMA

the DNA sequence data is sliced and reformed into 4 clusters according with biological meanings:the coding sequence cluster

the intron cluster

the RNA cluster and the residual cluster.By employing pointed compression strategies in data pre-processing

the clusters are compressed separately with LZMA algorithm.Experimental results demonstrated the better performance of BioLZMA than original DNA compression algorithms on benchmark sequences.Especially on long DNA sequence with significant bioinformatics features

BioLZMA algorithm can achieve higher compression ratio with little computation time.

关键词

Keywords

references

浏览量

2535

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

Analog-Cell:一种新的电子细胞图形模型

基于随机复杂度约束的高维特征自动选择算法

基于关键字树的DNA多序列星比对算法

DNA计算的研究进展

基于多Agent的生物信息数据整合系统-BioAgent1