基于生成对抗网络的差分隐私数据发布方法

方晨; 郭渊博; 王娜; 甄帅辉; 唐国栋

doi:10.3969/j.issn.0372-2112.2020.10.016

您当前的位置：

首页 >

文章列表页 >

基于生成对抗网络的差分隐私数据发布方法

学术论文 | 更新时间：2023-02-24

- 基于生成对抗网络的差分隐私数据发布方法
- Differential Private Data Publishing Method Based on Generative Adversarial Network
- 电子学报 2020年48卷第10期页码：1983-1992
- 作者机构：
  
  1. 信息工程大学,河南,郑州,450001
  3. 中国人民解放军93808部队,甘肃,兰州,730000
  4. 中国人民解放军75775部队,广东,广州,510000
- 作者简介：
  
  [ "方晨男,1993年出生,安徽安庆人.战略支援部队信息工程大学博士研究生.研究方向为机器学习隐私安全." ]
  [ "王娜女,1970年出生,河南郑州人.战略支援部队信息工程大学副教授.研究方向为网络信息安全." ]
- 基金信息：
  
  国家自然科学基金（No.61501515，No.61601515）;信息保障技术重点实验室开放基金（No.KJ-15-108）
- DOI：10.3969/j.issn.0372-2112.2020.10.016
  中图分类号： TP301
- 网络出版：2020-10-25，
  
  纸质出版：2020
- 稿件说明：
移动端阅览
方晨, 郭渊博, 王娜, 等. 基于生成对抗网络的差分隐私数据发布方法[J]. 电子学报, 2020,48(10):1983-1992.

FANG Chen, GUO Yuan-bo, WANG Na, et al. Differential Private Data Publishing Method Based on Generative Adversarial Network[J]. Acta Electronica Sinica, 2020, 48(10): 1983-1992.
方晨, 郭渊博, 王娜, 等. 基于生成对抗网络的差分隐私数据发布方法[J]. 电子学报, 2020,48(10):1983-1992. DOI： 10.3969/j.issn.0372-2112.2020.10.016.

FANG Chen, GUO Yuan-bo, WANG Na, et al. Differential Private Data Publishing Method Based on Generative Adversarial Network[J]. Acta Electronica Sinica, 2020, 48(10): 1983-1992. DOI： 10.3969/j.issn.0372-2112.2020.10.016.

摘要

机器学习的飞速发展使其成为数据挖掘领域最有效的工具之一，但算法的训练过程往往需要大量的用户数据，给用户带来了极大的隐私泄漏风险.由于数据统计特征的复杂性及语义丰富性，传统隐私数据发布方法往往需要对原始数据进行过度清洗，导致数据可用性低而难以再适用于数据挖掘任务.为此，提出了一种基于生成对抗网络（Generative Adversarial Network，GAN）的差分隐私数据发布方法，通过在GAN模型训练的梯度上添加精心设计的噪声来实现差分隐私，确保GAN可无限量生成符合源数据统计特性且不泄露隐私的合成数据.针对现有同类方法合成数据质量低、模型收敛缓慢等问题，设计多种优化策略来灵活调整隐私预算分配并减小总体噪声规模，同时从理论上证明了合成数据严格满足差分隐私特性.在公开数据集上与现有方法进行实验对比，结果表明本方法能够更高效地生成质量更高的隐私保护数据，适用于多种数据分析任务.

Abstract

The rapid development of machine learning makes itself one of the most effective tools in the data mining research community.However

the training of algorithm often needs a large amount of user data

which brings a great risk of privacy leakage to users.Due to the complex statistical characteristics and semantic richness of the data

traditional private data publishing methods usually sanitize original data too excessively to lead to low data availability and uselessness in data mining tasks.In this paper

a differential private data publishing method based on generative adversarial network (GAN) is proposed.The differential privacy of the GAN model is realized by adding carefully designed noise to the gradients during the training procedure

so that the GAN can generate unlimited synthetic data conforming to the original statistical characteristics without disclosing any privacy.Aiming at the problems of low quality synthetic data and slow convergence in the existing similar methods

several optimization strategies are designed to adjust the privacy budget allocation and reduce the overall noise scale.Moreover

we provide rigorous proof that the synthetic data satisfies the differential privacy.Comparisons with existing methods on public datasets show that the method proposed can generate private data with higher quality more efficiently

which is suitable for various data analysis tasks.

关键词

Keywords

references

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于序列格的隐私时序模式挖掘方法

基于注意力机制优化的生成对抗网络及其在海杂波模拟中的应用

基于稀疏平滑自蒸馏的差分隐私深度学习方法

天波超视距雷达地海杂波图像增强与检测器设计

基于差分隐私的联盟链上双向能源拍卖隐私保护