Research of Address Information Automatic Annotation Based on Deep Learning

LING Guang-ming; XU Ai-ping; WANG Wei

doi:10.3969/j.issn.0372-2112.2020.11.001

您当前的位置：

首页 >

文章列表页 >

Research of Address Information Automatic Annotation Based on Deep Learning

更新时间：2025-12-08

- Research of Address Information Automatic Annotation Based on Deep Learning
- Acta Electronica Sinica Vol. 48, Issue 11, Pages: 2081-2091(2020)
- 作者机构：
  
  1. 武汉大学计算机学院,湖北,武汉,430072
  2. 武汉大学测绘遥感信息工程国家重点实验室,湖北,武汉,430079
  3. 武汉大学计算机学院,湖北,武汉,430072
  4. 武汉大学测绘遥感信息工程国家重点实验室,湖北,武汉,430079
- 作者简介：
- 基金信息：
  
  Supported by National Key Research and Development Program of China (No.2017YFC0803700)
- DOI：10.3969/j.issn.0372-2112.2020.11.001
  CLC： TP183
- Published Online：25 November 2020，
  
  Published：2020
- 稿件说明：
移动端阅览
LING Guang-ming, XU Ai-ping, WANG Wei. Research of Address Information Automatic Annotation Based on Deep Learning[J]. Acta Electronica Sinica, 2020, 48(11): 2081-2091.
DOI：

LING Guang-ming, XU Ai-ping, WANG Wei. Research of Address Information Automatic Annotation Based on Deep Learning[J]. Acta Electronica Sinica, 2020, 48(11): 2081-2091. DOI： 10.3969/j.issn.0372-2112.2020.11.001.

摘要

文本序列的自动标注能够解决深度学习普遍面临的人工标注成本过高的问题.本文针对地址信息的实体表述特征，构建基于实体边界矩阵（Entity Boundary Matrix，EBM）的表示模型，在此基础上提出了一种基于深度学习和KNN标签修正算法（K-Nearest Neighbours Correction Algorithm，KNN-CA）的不需要任何人工标注训练集的自动标注算法.首先获取预置小区数据集并构建离线特征库和初始化在线特征库；接着通过匹配算法求解EBM并利用KNN-CA进行优化，再通过数据增广得到自动标注的训练集；然后训练BiLSTM-CRF深度学习模型并预测所有未曾标注的地址信息的序列标注；最后再次利用KNN-CA优化可求解EBM的序列标注，由此构建适用于中文地理命名实体（Chinese Geospatial Named Entities，CGSNE）识别及相关研究的序列标注语料库.实验表明，标注数据的

1值达到了95.35%.

Abstract

Automatic annotation of text sequence can address the common issue of high manual annotation labor cost in deep learning. In this paper

a representation model based on the entity boundary matrix (EBM) is constructed. On the basis

we propose an automatic annotation algorithm combining deep learning with KNN annotation correction algorithm (KNN-CA) where the manual labeling training set is not required. Firstly

the offline feature library and online feature library is built and initialized respectively with the utilization of collecting estate dataset. In addition

EBM is solve

d by matching algorithm and optimized via KNN-CA technique. After the data augmentation process

a training dataset of automatic annotation is obtained. Then the BiLSTM-CRF deep learning model is trained and all unlabeled annotation sequence is predicted. Eventually

the annotation sequence of solvable EBM is optimized via KNN-CA again so as to construct a sequence annotatied corpus dataset which is suitable for the identification of Chinese Geospatial Named Entities (CGSNE) and related researches. The experiment demonstrates that

1 score of labeled data reaches 95.35%.

关键词

Keywords

references

Views

287

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Variable Horizon Multi-Directional Scanning Method for Time Series Anomaly Detection

Differentially Private with Sparse and Smooth Self-Distillation

Operator Fusion Method and Hardware Architecture Design Based on Non-Standard Operators

A Method for Enhancing the Quality of Decompressed Point Clouds Based on Attention-Fused Multi-Scale Features

Research on Joint Source-Channel Coding Method Based on Deep Compressive Sensing

Related Author

HUANG Yu-zhe

GUAN Yong-yuan

WEI Song-jie

ZHAO Deng-feng

XUE Da-xuan

ZHAO Su-yun

CHEN Hong

WANG Ying

Related Institution

School of Computer Science and Engineering, School of Cyber Science and Engineering, Nanjing University of Science and Technology

School of Information, Renmin University of China

College of Information Engineering, Capital Normal University

School of Mathematical Science, Capital Normal University

Faculty of Software Technologics, Shanxi Agricultural University

⁰