电子学报 ›› 2018, Vol. 46 ›› Issue (5): 1047-1055.DOI: 10.3969/j.issn.0372-2112.2018.05.004

• 学术论文 • 上一篇    下一篇

面向大规模认知诊断的DINA模型快速计算方法研究

王超1, 刘淇1, 陈恩红1, 黄振亚1, 朱天宇1, 苏喻2, 胡国平3   

  1. 1. 中国科学技术大学计算机科学与技术学院, 安徽合肥 230027;
    2. 安徽大学计算机科学与技术学院, 安徽合肥 230039;
    3. 科大讯飞股份有限公司, 安徽合肥 230088
  • 收稿日期:2016-12-20 修回日期:2017-03-07 出版日期:2018-05-25
    • 通讯作者:
    • 陈恩红
    • 作者简介:
    • 王超 男,1995年生于安徽淮南.中国科学技术大学计算机科学与技术学院硕士研究生,研究方向为机器学习、推荐系统.E-mail:wdyx2012@mail.ustc.edu.cn;刘淇 男,1986年生于山东临沂,博士,副教授,研究方向为数据挖掘与知识发现、机器学习方法及其应用.E-mail:qiliuql@ustc.edu.cn
    • 基金资助:
    • 国家863高技术研究发展计划 (No.2015AA015409); 国家杰出青年科学基金 (No.61325010); 国家自然科学基金 (No.61672483,No.U1605251); 中科院青年创新促进会会员专项基金 (会员编号2014299)

The Rapid Calculation Method of DINA Model for Large Scale Cognitive Diagnosis

WANG Chao1, LIU Qi1, CHEN En-hong1, HUANG Zhen-ya1, ZHU Tian-yu1, SU Yu2, HU Guo-ping3   

  1. 1. School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China;
    2. School of Computer Science and Technology, Anhui University, Hefei, Anhui 230039, China;
    3. USTC iFLYTEK Co, Ltd, Hefei, Anhui 230088, China
  • Received:2016-12-20 Revised:2017-03-07 Online:2018-05-25 Published:2018-05-25
    • Corresponding author:
    • CHEN En-hong
    • Supported by:
    • National High-tech R&D Program of China  (863 Program) (No.2015AA015409); National Natural Science Foundation of China for Distinguished Young Schoolars (No.61325010); National Natural Science Foundation of China (No.61672483, No.U1605251); Youth Innovation Promotion Association CAS (会员编号2014299)

摘要: 在教育教学的过程中,如何诊断学生的知识水平是一个重要的问题.传统方法大多由教师根据学生的表现和成绩进行人工判断,存在效率低、主观性强的问题,且难以做到针对大量学生的个性化诊断.近年来,认知诊断模型中的DINA模型被广泛应用于诊断学生个性化知识掌握程度.然而传统DINA模型大多基于小样本数据,当面对在线教育带来的大规模数据处理需求时,存在收敛速度慢的问题,难以实际应用.针对DINA模型计算时间过长的问题,本文首先给出了DINA模型的收敛性证明,并提出了三种能够加速DINA求解的算法:(1)增量算法,它将学生数据划分为多个学生块,每次迭代只访问其中一个学生块;(2)最大熵方法,它只访问在极大化模型熵的过程中影响较大的学生数据;(3)基于前两者的混合方法.最后,本文通过真实数据和模拟数据上的实验,分析证明了三种方法均能在保证DINA模型有效性的情况下,达到几倍至几十倍的加速效果,有效地改善了DINA模型的计算效率.

关键词: 教育数据挖掘, 认知诊断, DINA模型, EM算法, 加速收敛

Abstract: How to assess students' cognitive structure is an important problem in the process of education and teaching.Traditionally,teachers evaluate a student based on their classroom performance and scores,which is lack of efficiency,objectivity,and it is hard to treat anyone equally.To solve this problem,DINA model,which is able to evaluate knowledge proficiency of students,has become a popular Cognitive Diagnosis model with a good interpretation.However,traditional DINA models are all based on small samples.When it comes to large-scale online learning scenario,the calculation will be significantly time-consuming.To address these issues,we first give proof of the convergence of DINA model,and then propose three acceleration methods.To be specific,the first one,called Incremental DINA(I-DINA),can partition the student data into blocks and iterate through the blocks.Then the second one,Maximum-Entropy DINA(ME-DINA),can choose and only access the most powerful students.At last,we combine the advantages of these two methods and propose the last model called Incremental Maximum Entropy DINA(IME-DINA).Extensive experiments on both a real-world dataset and simulation data demonstrate that our models can achieve dozens of acceleration without reducing the effectiveness of DINA Model.

Key words: educational data mining, cognitive diagnosis, DINA model, convergence acceleration, expectation maximization algorithm

中图分类号: