1. 信息工程大学,河南,郑州,450001
3. 中国人民解放军93808部队,甘肃,兰州,730000
4. 中国人民解放军75775部队,广东,广州,510000
[ "方晨男,1993年出生,安徽安庆人.战略支援部队信息工程大学博士研究生.研究方向为机器学习隐私安全." ]
[ "王娜女,1970年出生,河南郑州人.战略支援部队信息工程大学副教授.研究方向为网络信息安全." ]
网络出版:2020-10-25,
纸质出版:2020
移动端阅览
方晨, 郭渊博, 王娜, 等. 基于生成对抗网络的差分隐私数据发布方法[J]. 电子学报, 2020,48(10):1983-1992.
FANG Chen, GUO Yuan-bo, WANG Na, et al. Differential Private Data Publishing Method Based on Generative Adversarial Network[J]. Acta Electronica Sinica, 2020, 48(10): 1983-1992.
方晨, 郭渊博, 王娜, 等. 基于生成对抗网络的差分隐私数据发布方法[J]. 电子学报, 2020,48(10):1983-1992. DOI: 10.3969/j.issn.0372-2112.2020.10.016.
FANG Chen, GUO Yuan-bo, WANG Na, et al. Differential Private Data Publishing Method Based on Generative Adversarial Network[J]. Acta Electronica Sinica, 2020, 48(10): 1983-1992. DOI: 10.3969/j.issn.0372-2112.2020.10.016.
机器学习的飞速发展使其成为数据挖掘领域最有效的工具之一,但算法的训练过程往往需要大量的用户数据,给用户带来了极大的隐私泄漏风险.由于数据统计特征的复杂性及语义丰富性,传统隐私数据发布方法往往需要对原始数据进行过度清洗,导致数据可用性低而难以再适用于数据挖掘任务.为此,提出了一种基于生成对抗网络(Generative Adversarial Network,GAN)的差分隐私数据发布方法,通过在GAN模型训练的梯度上添加精心设计的噪声来实现差分隐私,确保GAN可无限量生成符合源数据统计特性且不泄露隐私的合成数据.针对现有同类方法合成数据质量低、模型收敛缓慢等问题,设计多种优化策略来灵活调整隐私预算分配并减小总体噪声规模,同时从理论上证明了合成数据严格满足差分隐私特性.在公开数据集上与现有方法进行实验对比,结果表明本方法能够更高效地生成质量更高的隐私保护数据,适用于多种数据分析任务.
The rapid development of machine learning makes itself one of the most effective tools in the data mining research community.However
the training of algorithm often needs a large amount of user data
which brings a great risk of privacy leakage to users.Due to the complex statistical characteristics and semantic richness of the data
traditional private data publishing methods usually sanitize original data too excessively to lead to low data availability and uselessness in data mining tasks.In this paper
a differential private data publishing method based on generative adversarial network (GAN) is proposed.The differential privacy of the GAN model is realized by adding carefully designed noise to the gradients during the training procedure
so that the GAN can generate unlimited synthetic data conforming to the original statistical characteristics without disclosing any privacy.Aiming at the problems of low quality synthetic data and slow convergence in the existing similar methods
several optimization strategies are designed to adjust the privacy budget allocation and reduce the overall noise scale.Moreover
we provide rigorous proof that the synthetic data satisfies the differential privacy.Comparisons with existing methods on public datasets show that the method proposed can generate private data with higher quality more efficiently
which is suitable for various data analysis tasks.
0
浏览量
47
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621