Abstract:The rapid development of machine learning makes itself one of the most effective tools in the data mining research community.However,the training of algorithm often needs a large amount of user data,which brings a great risk of privacy leakage to users.Due to the complex statistical characteristics and semantic richness of the data,traditional private data publishing methods usually sanitize original data too excessively to lead to low data availability and uselessness in data mining tasks.In this paper,a differential private data publishing method based on generative adversarial network (GAN) is proposed.The differential privacy of the GAN model is realized by adding carefully designed noise to the gradients during the training procedure,so that the GAN can generate unlimited synthetic data conforming to the original statistical characteristics without disclosing any privacy.Aiming at the problems of low quality synthetic data and slow convergence in the existing similar methods,several optimization strategies are designed to adjust the privacy budget allocation and reduce the overall noise scale.Moreover,we provide rigorous proof that the synthetic data satisfies the differential privacy.Comparisons with existing methods on public datasets show that the method proposed can generate private data with higher quality more efficiently,which is suitable for various data analysis tasks.
[1] Fredrikson M,Jha S,Ristenpart T.Model inversion attacks that exploit confidence information and basic countermeasures[A].Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security[C].USA:ACM,2015.1322-1333.
[2] Cynthia Dwork,Aaron Roth.The Algorithmic Foundations of Differential Privacy[M].USA:Now Foundations and Trends,2014.
[3] 杨高明,朱海明,方贤进,等.局部差分隐私约束的关联属性不变后随机响应扰动[J].电子学报,2019,47(5):1079-1085. YANG Gao-ming,ZHU Hai-ming,FANG Xian-jin,et al.Invariant post-random response perturbation for correlated attributes under local differential privacy constraint[J].Acta Electronica Sinica,2019,47(5):1079-1085.(in Chinese)
[4] 傅继彬,张啸剑,丁丽萍.MAXGDDP:基于差分隐私的决策数据发布算法[J].通信学报,2018,39(3):136-146. FU Ji-bin,ZHANG Xiao-jian,DING Li-ping.MAXGDDP:decision data release with differential privacy[J].Journal of Communications,2018,39(3):136-146.(in Chinese)
[5] Zhang J,Cormode G,Procopiuc C M,et al.Privbayes:Private data release via Bayesian networks[J].ACM Transactions on Database Systems (TODS),2017,42(4):1-41.
[6] Asghar H J,Ding M,Rakotoarivelo T,et al.Differentially private release of high-dimensional datasets using the Gaussian copula[J].arXiv Preprint,2019, arXiv:1902.01499.
[7] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[A].Advances in Neural Information Processing Systems[C].USA:ACM,2014.2672-2680.
[8] Xie L,Lin K,Wang S,et al.Differentially private generative adversarial network[J].arXiv Preprint,2018,arXiv:1802.06739.
[9] Acs G,Melis L,Castelluccia C,et al.Differentially private mixture of generative neural networks[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(6):1109-1121.
[10] Xu C,Ren J,Zhang D,et al.GANobfuscator:Mitigating information leakage under GAN via differential privacy[J].IEEE Transactions on Information Forensics and Security,2019,14(9):2358-2371.
[11] 郭鹏,钟尚平,陈开志,等.差分隐私GAN梯度裁剪阈值的自适应选取方法[J].网络与信息安全学报,2018,4(5):10-20. GUO Peng,ZHONG Shang-ping,CHEN Kai-zhi,et al.Adaptive selection method of differential privacy GAN gradient clipping thresholds[J].Chinese Journal of Network and Information Security,2018,4(5):10-20.(in Chinese)
[12] Gulrajani I,Ahmed F,Arjovsky M,et al.Improved training of Wasserstein gans[A].Advances in Neural Information Processing Systems[C].USA:ACM,2017.5767-5777.
[13] ABADI M,CHU A,GOODFELLOW I,et al.Deep learning with differential privacy[A].The ACM SIGSAC Conference on Computer and Communications Security[C].USA:ACM,2016.308-318.
[14] Wang Q,Zhang Y,Lu X,et al.Real-time and spatio-temporal crowd-sourced social network data publishing with differential privacy[J].IEEE Transactions on Dependable and Secure Computing,2018,15(4):591-606.
[15] 李万杰,张兴,曹光辉,等.基于差分隐私保护的数据分级融合发布机制[J].小型微型计算机系统,2019,40(10):2252-2256. LI Wan-jie,ZHANG Xing,CAO Guang-hui,et al.Hierarchical data fusion publishing mechanism based on differential privacy protection[J].Journal of Chinese Computer Systems,2019,40(10):2252-2256.(in Chinese)
[16] Chollet F.Xception:Deep learning with depth wise separable convolutions[A].IEEE Conference on Computer Vision and Pattern Recognition (CVPR)[C].USA:IEEE,2017.1800-1807.
[17] Deep Learning Tutorials[OL].http://deeplearning.net/tutorial/,2019-3-26.