上海交通大学计算机系,上海,200030
纸质出版:2002
移动端阅览
马颖华, 王永成, 苏贵洋. 一种在汉语文本中抽取重复字串的快速算法[J]. 电子学报, 2002,30(S1):2177-2180.
MA Ying-hua, WANG Yong-cheng, SU Gui-yang. A Fast Approach of Extracting Repeated String from Chinese Text[J]. Acta Electronica Sinica, 2002, 30(S1): 2177-2180.
词典未登录词的处理是自然语言处理不可或缺的研究方向.抽取文本中重复出现的字串是抽取未登录词最为直接简便的方法.以往算法运行速度较慢
无法满足海量文本快速处理的要求.遵循左结合优先和最长匹配原则
本文提出一种快速算法:位置记忆跳跃匹配.该方法最差情况下时间复杂度为o(t
2
)
其中t为重复字串的重复次数.比较实验表明
本方法速度提高明显
数据结构简单
处理过程一次扫描完成.
The processing of words unlisted in dictionaries is necessary in natural language processing. Extraction of repeated string appearing in text is the most direct
convenient method
and it is rather effective. Fisting algorithms can not meet the requirement of high speed in vast text processing system. Aceording to principles of left first and longest first
a fast approch named Postitional Remembering and Jump Matching which works in worst condition o(t
2
) time
where is repeating times of substring
is put forwards.Results of experments show that compared with previous methods
this method gains advantages such as high speed
simple data structures
and simultaneous scanning and matclting processing.
0
浏览量
1487
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621