Optimization of Chinese Word Segmentation in Named Entity Recognition and Word Alignment
YIN Cun-yan1,2, HUANG Shu-jian1,2, DAI Xin-yu1,2, CHEN Jia-jun1,2
1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210023, China;
2. Department of Computer Science and Technology, Nanjing University, Nanjing, Jiangsu 210023, China
Bilingual named entity recognition and alignment are important for many natural language processing.Named entity translation can improve a lot the performance of the system like statistical machine translation or cross-language information retrieval.Quality of Chinese word segmentation does have a big impact over named entity (NE) recognition and bilingual NE extraction.Bilingual alignment information provides indications for NE recognition and word segmentation.Accordingly, based on the characteristics of NE recognition, NE alignment, and word segmentation, this paper proposes an optimization algorithm of Chinese word segmentation.By correcting word segmentation error and adjusting word segmentation granularity, the optimization algorithm can enhance extraction effect of Chinese-English NE translation and performance of statistical machine translation.The experimental result on Chinese-English news corpus shows the efficiency of our algorithm.