

浏览全部资源
扫码关注微信
1. 大连海事大学计算机科学与技术学院,辽宁,大连,116026
2. 清华大学计算机 科学与技术系,北京,100084
3. 大连理工大学应用数学系,辽宁,大连,116024
4. 大连海事大学计算机科学与技术学院辽宁大连,116026
5. 清华大学计算机 科学与技术系北京,100084
6. 大连理工大学应用数学系辽宁大连,116024
Published:2006
移动端阅览
LU Ming-yu, SHEN Dou, GUO Chong-hui, et al. Web-page Summarization Methods for Web-page Classification[J]. Acta Electronica Sinica, 2006, 34(8): 1475-1480.
网页分类是网络挖掘的重要研究内容之一.与文本分类相比
网页分类面临的困难更多.去除网页中的噪声信息可以提高网页分类的精度
基于摘要的网页分类方法利用了这一思想.本文对三种传统的网页摘要方法进行了分析和改进
提出了Content Body摘要方法以及基于四种摘要方法的混合摘要方法;在此基础上
进行了大量基于摘要的网页分类实验.实验结果表明
所有的摘要方法都可以提高分类效果
其中混和摘要方法效果最好
可以使分类的F1值得到12.9%的改进.
Web-page classification is an important research direction of web mining and much more difficult than pure-text classification.The accuracy of web-page classification can be heightened by getting rid of noisy information embedded in web pages
and the idea is utilized by our proposed summarization-based web-page classification method.In the paper
three traditional web-page summarization methods are analyzed and improved
and the Content Body summarization method and an ensemble summarization method based on four summarization methods are proposed.A large amount of experimental results of web-page classification based on summarization show that all the summarization methods can improve the performance of web-page classification algorithms and the ensemble summarization method achieves a 12.9% improvement over pure-text based methods.
0
Views
1238
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621