电子学报 ›› 2019, Vol. 47 ›› Issue (11): 2337-2343.DOI: 10.3969/j.issn.0372-2112.2019.11.015

• 学术论文 • 上一篇    下一篇

面向关联属性的差分隐私信息熵度量方法

吴宁博1,2, 彭长根1,3, 牟其林2   

  1. 1. 贵州大学计算机科学与技术学院, 贵州贵阳 550025;
    2. 中电科大数据研究院有限公司, 贵州贵阳 550022;
    3. 公共大数据国家重点实验室, 贵州贵阳 550025
  • 收稿日期:2018-10-08 修回日期:2019-03-22 出版日期:2019-11-25
    • 通讯作者:
    • 彭长根
    • 作者简介:
    • 吴宁博 男,1989年生,河南驻马店人,博士研究生,主要研究方向为数据安全、隐私保护.E-mail:hn_dragon@163.com
    • 基金资助:
    • 国家自然科学基金 (No.U1836205,No.61662009,No.61772008,No.11761020); 贵州省科技重大专项计划 (No.20183001); 贵州省自然科学基金 (黔科合基础[2017]1045); 贵州省研究生科研基金立项课题 (No.KYJJ2017005); "十三五"国家密码发展基金 (No.MMJJ20170129); 贵州省科技计划课题 (黔科合重大专项字[2017]3002,黔科合平台人才[2017]5788,黔科合重大专项字[2018]3007)

Information Entropy Metric Methods of Association Attributes for Differential Privacy

WU Ning-bo1,2, PENG Chang-gen1,3, MOU Qi-lin2   

  1. 1. College of Computer Science&Technology, Guizhou University, Guiyang, Guizhou 550025, China;
    2. CETC Big Data Research Institute Co., Ltd., Guiyang, Guizhou 550022, China;
    3. National Key Laboratory of Public Big Data, Guiyang, Guizhou 550025, China
  • Received:2018-10-08 Revised:2019-03-22 Online:2019-11-25 Published:2019-11-25
    • Corresponding author:
    • PENG Chang-gen
    • Supported by:
    • National Natural Science Foundation of China (No.U1836205, No.61662009, No.61772008, No.11761020); Science and Technology Major Project of Guizhou Province (No.20183001); Natural Science Foundation of Guizhou Province,  China (黔科合基础[2017]1045); Graduate Research Fund Project of Guizhou Province (No.KYJJ2017005); National Cryptography Development Fund during the 13th Five-year Plan (No.MMJJ20170129); Science and Technology Project of Guizhou Province (黔科合重大专项字[2017]3002, 黔科合平台人才[2017]5788, 黔科合重大专项字[2018]3007)

摘要: 针对差分隐私非交互式多属性关联的合成数据集发布问题,基于信息熵、汉明失真提出了发布数据集隐私度、数据效用、隐私泄露风险的量化方法.首先,利用互信息量分析属性相关度,并以关联依赖图模型表达属性关联.其次,基于图中关键隐私泄露路径构建马尔可夫隐私泄露链,并结合信息熵提出一种关联属性隐私度量模型及方法,可以有效的度量由关联属性引起的隐私泄露量.最后,通过具体实例验证了模型与方法的有效性,并对比分析了该方法的优势.

关键词: 差分隐私, 信息熵, 隐私度量, 关联属性

Abstract: Privacy leakage and utility measurement are widely concerned issues in multi-attribute datasets by non-interactive differential privacy publishing. In this paper, we have proposed several quantification methods by using information entropy and hamming distortion to quantify the privacy of published dataset, utility of dataset and risk of privacy leakage. First, we have tailored the existing mutual information concept to analyze the relationship among associated attributes and constructed an associated dependency graph model to analyze their correlations among multi-attribute. After that, we have developed a privacy quantification method based on information entropy and privacy leakage Markov chain, which is generated based on the graph of privacy leakage path that has a valid efficiency measurement of the privacy leakage leading by associated attributes. Finally, to justify the efficiency of the proposed model, we have included an illustrative example and demonstrated the advantage of our method by comparing with other methods.

Key words: differential privacy, information entropy, privacy metric, association attributes

中图分类号: