电子学报 ›› 2006, Vol. 34 ›› Issue (10): 1752-1757.

• 论文 • 上一篇    下一篇

一种基于UCL的中文网页信息过滤方法

邢 玲1,2, 马建国2, 李幼平3, 刘志文1   

  1. 1. 北京理工大学电子工程系,北京 100081;2. 西南科技大学信息工程学院,四川绵阳 621010;3. 中国工程物理研究院北京应用物理与计算数学研究所,北京 100088
  • 收稿日期:2006-01-19 修回日期:2006-07-21 出版日期:2006-10-25 发布日期:2006-10-25

An Information Filtering Method for Chinese Web Pages Based on UCL

XING Ling1,2, MA Jian-guo2, LI You-ping3, LIU Zhi-wen1   

  1. 1. Department of Electronic Engineering,Beijing Institute of Technology,Beijing 100081,China;2. School of Information Engineering,Southwest University of Science and Technology,Mianyang,Sichuan 621010,China;3. Institute of Applied Physics and Computational Mathematics,China Academy of Engineering Physics,Beijing 100088,China
  • Received:2006-01-19 Revised:2006-07-21 Online:2006-10-25 Published:2006-10-25

摘要: 围绕如何在浩瀚的中文网页中找到用户感兴趣的内容,提出了基于UCL (Uniform Content Locator)的"二阶过滤法".它将媒体空间中的信息用UCL语义格 (Semantic Cases based on UCL,SCU)表示,通过语义向量空间模型 (Semantic Vector Space Model,SVSM)对网页的语义矩阵进行分析计算,粗略筛选出用户感兴趣的网页;再借助精细语义逐句解读其内容,提取用户所关注的信息.根据用户的阅读行为动态了解用户的兴趣变化,建立用户兴趣的本体模型,并分析和定义了用户兴趣度的度量.实验验证了上述过滤方法的有效性,其测试结果同向量空间模型(Vector Space Model,VSM)进行了比较,性能明显优于VSM.

关键词: UCL, 信息过滤, UCL语义格, 语义向量空间, 兴趣本体模型

Abstract: The work focuses on filtering users' interested contents in Chinese web pages.Two-stage filtering method based on UCL is presented.SCU is brought forward to express the information of Medium Space.SVSM is introduced to filtrate cursorily web pages,and then contents of these pages are understood by virtue of some elaborate semantic characteristics,so the web pages which users are interested in can be extracted.At the same time,the users’ interested changes are tracked dynamically according to the reading actions,and the interesting ontological profile is submitted,then the measure of interestingness is analyzed and calculated.Laboratory simulations demonstrate the arithmetic feasibility and validity.

Key words: UCL, information filtering, SCU, SVSM, interesting ontological profile

中图分类号: