

浏览全部资源
扫码关注微信
上海交通大学计算机系,上海,200030
Published:2002
移动端阅览
MA Ying-hua, WANG Yong-cheng, SU Gui-yang. A Fast Approach of Extracting Repeated String from Chinese Text[J]. Acta Electronica Sinica, 2002, 30(S1): 2177-2180.
词典未登录词的处理是自然语言处理不可或缺的研究方向.抽取文本中重复出现的字串是抽取未登录词最为直接简便的方法.以往算法运行速度较慢
无法满足海量文本快速处理的要求.遵循左结合优先和最长匹配原则
本文提出一种快速算法:位置记忆跳跃匹配.该方法最差情况下时间复杂度为o(t
2
)
其中t为重复字串的重复次数.比较实验表明
本方法速度提高明显
数据结构简单
处理过程一次扫描完成.
The processing of words unlisted in dictionaries is necessary in natural language processing. Extraction of repeated string appearing in text is the most direct
convenient method
and it is rather effective. Fisting algorithms can not meet the requirement of high speed in vast text processing system. Aceording to principles of left first and longest first
a fast approch named Postitional Remembering and Jump Matching which works in worst condition o(t
2
) time
where is repeating times of substring
is put forwards.Results of experments show that compared with previous methods
this method gains advantages such as high speed
simple data structures
and simultaneous scanning and matclting processing.
0
Views
1487
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621