A Fast Approach of Extracting Repeated String from Chinese Text

MA Ying-hua; WANG Yong-cheng; SU Gui-yang

您当前的位置：

首页 >

文章列表页 >

A Fast Approach of Extracting Repeated String from Chinese Text

更新时间：2025-07-16

- A Fast Approach of Extracting Repeated String from Chinese Text
- Acta Electronica Sinica Vol. 30, Issue S1, Pages: 2177-2180(2002)
- 作者机构：
  
  上海交通大学计算机系,上海,200030
- 作者简介：
- 基金信息：
- DOI：
  CLC： TP391.1
- Published：2002
- 稿件说明：
移动端阅览
MA Ying-hua, WANG Yong-cheng, SU Gui-yang. A Fast Approach of Extracting Repeated String from Chinese Text[J]. Acta Electronica Sinica, 2002, 30(S1): 2177-2180.
DOI：

MA Ying-hua, WANG Yong-cheng, SU Gui-yang. A Fast Approach of Extracting Repeated String from Chinese Text[J]. Acta Electronica Sinica, 2002, 30(S1): 2177-2180. DOI：

摘要

词典未登录词的处理是自然语言处理不可或缺的研究方向.抽取文本中重复出现的字串是抽取未登录词最为直接简便的方法.以往算法运行速度较慢

无法满足海量文本快速处理的要求.遵循左结合优先和最长匹配原则

本文提出一种快速算法:位置记忆跳跃匹配.该方法最差情况下时间复杂度为o(t

)

其中t为重复字串的重复次数.比较实验表明

本方法速度提高明显

数据结构简单

处理过程一次扫描完成.

Abstract

The processing of words unlisted in dictionaries is necessary in natural language processing. Extraction of repeated string appearing in text is the most direct

convenient method

and it is rather effective. Fisting algorithms can not meet the requirement of high speed in vast text processing system. Aceording to principles of left first and longest first

a fast approch named Postitional Remembering and Jump Matching which works in worst condition o(t

) time

where is repeating times of substring

is put forwards.Results of experments show that compared with previous methods

this method gains advantages such as high speed

simple data structures

and simultaneous scanning and matclting processing.

关键词

Keywords

references

Views

1487

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

⁰