1. 山东工商学院计算机科学与技术学院,山东,烟台,264005
2. 山东工商学院信息与电子工程学院,山东,烟台,264005
3. 山东省高等学校协同创新中心:未来智能计算,山东,烟台,264005
4. 山东省高校智能信息处理重点实验室(山东工商学院),山东,烟台,264005
5. 大连海事大学信息科学技术学院,辽宁,大连,116026
6. 山东工商学院计算机科学与技术学院,山东,烟台,264005
7. 山东工商学院信息与电子工程学院,山东,烟台,264005
8. 山东省高等学校协同创新中心:未来智能计算,山东,烟台,264005
9. 山东省高校智能信息处理重点实验室(山东工商学院),山东,烟台,264005
10. 大连海事大学信息科学技术学院,辽宁,大连,116026
网络出版:2021-03-25,
纸质出版:2021
移动端阅览
唐焕玲, 郑涵, 刘艳红, 等. Tr-SLDA:一种面向交叉领域的迁移主题模型[J]. 电子学报, 2021,49(3):605-613.
TANG Huan-ling, ZHENG Han, LIU Yan-hong, et al. Tr-SLDA: A Transfer Topic Model for Cross-Domains[J]. Acta Electronica Sinica, 2021, 49(3): 605-613.
唐焕玲, 郑涵, 刘艳红, 等. Tr-SLDA:一种面向交叉领域的迁移主题模型[J]. 电子学报, 2021,49(3):605-613. DOI: 10.12263/DZXB.20200210.
TANG Huan-ling, ZHENG Han, LIU Yan-hong, et al. Tr-SLDA: A Transfer Topic Model for Cross-Domains[J]. Acta Electronica Sinica, 2021, 49(3): 605-613. DOI: 10.12263/DZXB.20200210.
当目标领域缺少足够多的标注数据时,迁移学习利用相关源领域的标注数据,辅助提升目标域的学习性能,但是目标域与源域的数据通常不满足独立同分布,容易导致负迁移问题.本文在有监督主题模型(Supervised LDA,SLDA)的基础上,融合迁移学习方法提出一种共享主题知识的迁移主题模型(Transfer SLDA,Tr-SLDA),提出Tr-SLDA-Gibbs主题采样新方法,在类别标签的约束下对不同领域文档中的词采取不同的采样策略,且无需指定主题个数.辅助源域与目标域共享潜在主题空间,Tr-SLDA通过发现潜在共享主题与不同领域类别之间的语义关联从源域迁移知识,可以有效解决负迁移问题.基于Tr-SLDA迁移主题模型提出Tr-SLDA-TC (Tr-SLDA Text Categorization)文本分类方法.对比实验表明,该方法可有效利用源域知识来提高目标领域的分类性能.
With enough labeled data lacking in the target domain
it works well for transfer learning to use the labeled data of the related source domain and help improve the learning performance of the target domain. However
the data of these two domains usually do not satisfy the independently identically distribution
which easily leads to the problem of "negative transfer". Tr-SLDA (Transfer SLDA)
a novel transfer topic model based on supervised topic model (Supervised LDA
SLDA) is proposed
which shares topic knowledge by integrating transfer learning. A new Tr-SLDA-Gibbs sampling method is proposed
under the constraints of category labels
different sampling strategies are adopted for words in the documents of different domains without specifying the number of topics. The source domain and target domain share the potential topic space
Tr-SLDA can effectively solve the problem of "negative transfer" by discovering the semantic correlation between the potential shared topics and categories of different domains. The Tr-SLDA-TC (Tr-SLDA-Text Categorization) text classification method is proposed based on the Tr-SLDA model. The comprehensive experiments show that the proposed method can effectively improve the performance of the classification by utilizing the knowledge from the source domain.
0
浏览量
19
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621