电子学报 ›› 2020, Vol. 48 ›› Issue (1): 153-163.DOI: 10.3969/j.issn.0372-2112.2020.01.019

• 学术论文 • 上一篇    下一篇

基于序列格的隐私时序模式挖掘方法

彭慧丽1,2, 金凯忠1, 付聪聪1, 付楠1, 张啸剑1   

  1. 1. 河南财经政法大学计算机与信息工程学院, 河南郑州 450002;
    2. 河南广播电视大学信息工程学院, 河南郑州 450046
  • 收稿日期:2018-10-18 修回日期:2019-05-23 出版日期:2020-01-25
    • 通讯作者:
    • 张啸剑
    • 作者简介:
    • 彭慧丽 女,1981年生于河南周口.主要研究方向为数据库、隐私保护.Email:phl81@163.com;金凯忠 男,1991年生于河南开封.河南财经政法大学硕士.主要研究方向为差分隐私、数据库.E-mail:kaizhong@huel.edu.cn;付聪聪 女,1995年生于河南商丘.硕士研究生.主要研究方向为差分隐私、图像处理.E-mail:congf@huel.edu.cn;付楠 男,1988年生于河南开封.硕士研究生.主要研究方向为差分隐私、数据库.E-mail:funan@huel.edu.cn
    • 基金资助:
    • 国家自然科学基金 (No.61502146,No.91646203,No.91746115,No.61772131,No.61702161); 河南省自然科学基金 (No.162300410006); 河南省科技攻关项目 (No.162102310411); 河南省教育厅高等学校重点科研项目 (No.16A520002); 河南省高等学校青年骨干教师培养计划 (No.2017GGJS084); 河南财经政法大学青年拔尖人才资助计划

Private Time Series Pattern Mining with Sequential Lattice

PENG Hui-li1,2, JIN Kai-zhong1, FU Cong-cong1, FU Nan1, ZHANG Xiao-jian1   

  1. 1. School of Computer & Information Engineering, Henan University of Economics and Law, Zhengzhou, Henan 450002, China;
    2. School of Information Engineering, Henan Radio & Television University, Zhengzhou, Henan 450046, China
  • Received:2018-10-18 Revised:2019-05-23 Online:2020-01-25 Published:2020-01-25
    • Corresponding author:
    • ZHANG Xiao-jian
    • Supported by:
    • National Natural Science Foundation of China (No.61502146, No.91646203, No.91746115, No.61772131, No.61702161); Natural Science Foundation of Henan Province,  China (No.162300410006); Science and Technology Research and Development Program of Henan Province (No.162102310411); Key Science Research Program of Universities of Education Department of Henan Province (No.16A520002); Funding Project for Young Backbone Teachers of Universities in Henan Province (No.2017GGJS084); Funding Program for Youth Top-notch Talentsof Henan University of Economics and Law

摘要: 基于差分隐私的时间序列模式挖掘方法中,序列的最大长度以及添加拉普拉斯噪声的多少直接制约着挖掘结果的可用性.针对现有时间序列模式挖掘方法全局敏感度过高、挖掘结果可用性较低的不足问题,提出了一种基于序列格的差分隐私下时间序列模式挖掘方法PrivTSM(Differentially Private Time Series Pattern Mining).该方法首先利用最长路径的策略对原始数据库进行截断处理;在此基础上,采用表连接操作生成满足差分隐私的序列格;结合序列格结构本身的特性,合理分配隐私预算,提高输出模式的可用性.理论分析表明PrivTSM方法满足ε-差分隐私,基于真实数据库上实验结果表明,PrivTSM方法的准确率TPR(True Postive Rate)和平均相对误差ARE(Average Relative Error)明显优于N-gram和Prefix-Hybrid方法.

关键词: 差分隐私, 时间序列, 全局敏感度, 数据挖掘, 数据截断, 序列格

Abstract: Many methods of differentailly private time series pattern mining have been proposed, while in those methods, the length of sequence pattern and Laplace noise directly constrain the utility of the mining results. To address the questions caused by the global query sensitivity and lower utility of the existing works, an efficient method,called PrivTSM (differentially Private Time Series Pattern Mining) is proposed, which is based on sequence lattice for mining time series pattern with differential privacy. This method relies on the longest path strategy to truncate the original database; based on the truncated database, this method uses the table join operation to construct a differentially private sequence lattice. Furthermore, this method uses the property of the sequence lattice structure itself to allocate privacy budget reasonably and boost the accuracy of the noisy counts. PrivTSM satisfies ε-differential privacy through theoretical analysis. The experimental results on real datasets show that the accuracy (TPR) and average relative error (ARE) of the PrivTSM are better than those of the N-gram and Prefix-Hybrid algorithms.

Key words: differential privacy, time series, global sensitivity, data mining, data truncate, sequential lattice

中图分类号: