生物序列模体的混合Gibbs抽样识别算法

刘立芳, 霍红卫, 王宝树

电子学报 ›› 2008, Vol. 36 ›› Issue (4) : 750-755.

PDF(618 KB)
PDF(618 KB)
电子学报 ›› 2008, Vol. 36 ›› Issue (4) : 750-755.
论文

生物序列模体的混合Gibbs抽样识别算法

  • 刘立芳, 霍红卫, 王宝树
作者信息 +

Multiple Motif Discovery in Biological Sequences by Mixture Gibbs Sampling

  • LIU Li-fang, HUO Hong-wei, WANG Bao-shu
Author information +
文章历史 +

摘要

针对生物序列模体的识别问题,提出了一个新的混合Gibbs抽样识别算法.算法基于混合模体模型学习,采用贪心策略,通过似然度最大化,逐次将新的模体加入到混合模型中.算法中设计了位点抽样和模体抽样两种抽样方法,这两种抽样方法交替进行.为了加速搜索过程,对输入数据集采用了基于kd-trees的分层划分策略.实验结果表明,该算法对序列家族大量模体特征的识别具有显著优势,并且可建立更具统计特征的模体模型,从而提高序列分类的准确性.

Abstract

For the motif discovery problem of biological sequences,a mixture Gibbs sampling algorithm is presented.Based on mixture motifs model learning through likelihood maximization,a greedy strategy that adds sequentially new motif to a mixture model is employed.Two sampling methods are designed,site sampling and motif sampling,the two sampling methods are applied by turns.In order to speed up the searching procedure,a hierarchical partitioning scheme based on kd-trees is used for partitioning the input dataset.Experimental results indicate that the proposed algorithm is advantageous in identifying larger groups of motifs characteristic of biological families.In addition,it offers better diagnostic capabilities by building more powerful statistical motif models with improved classification accuracy.

关键词

生物信息学 / 模体识别 / Gibbs抽样 / 混合模体模型

Key words

bioinformatics / motif discovery / Gibbs sampling / mixture motifs model

引用本文

导出引用
刘立芳, 霍红卫, 王宝树. 生物序列模体的混合Gibbs抽样识别算法[J]. 电子学报, 2008, 36(4): 750-755.
LIU Li-fang, HUO Hong-wei, WANG Bao-shu. Multiple Motif Discovery in Biological Sequences by Mixture Gibbs Sampling[J]. Acta Electronica Sinica, 2008, 36(4): 750-755.
中图分类号: Q811.4    TP301.6   

基金

国家自然科学基金 (No.60705004); 陕西省自然科学基金 (No.2005F33)
PDF(618 KB)

2269

Accesses

0

Citation

Detail

段落导航
相关文章

/