OPEN ACCESS
Despite its huge benefits, the release of big data is faced with the severe risk of privacy leakage. To solve the problem, this paper proposes a deep neural network (DNN)-based algorithm for safe release of big data under random noise disturbance. Specifically, a random noise of a certain probability distribution was added into the release of the big data, such that the public output will not change significantly whether an individual data record is in the dataset and that that the published data will be basically the same to the original dataset. The algorithm was then optimized in light of the attributes of the correlated datasets in big data. Finally, the proposed algorithm was proved better than the traditional algorithm in large-scale searches of correlated datasets, and capable of ensuring privacy at a lower privacy budget.
deep neural network (DNN), big data, privacy preserving, differential privacy
This work is supported by {2018,2019} Foundation of Improving Academic Ability in University for Young Scholars of Guangxi.
Beimel A., Nissim K., Stemmer U. (2014). Private learning and sanitization: Pure vs. approximate differential privacy. APPROX 2013, RANDOM 2013. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, Vol. 8096, pp. 363-378. https://doi.org/10.1007/978-3-642-40328-6-26
Deng L., Yu D. (2014). Deep learning: Methods and applications. Foundations and Trends® in Signal Processing, Vol. 7, No. 3-4, pp. 197-387. http://dx.doi.org/10.1561/2000000039
Dwork C. (2011a). A firm foundation for private data analysis. Communications of the ACM, Vol. 54, No. 1, pp. 86-95. https://doi.org/10.1145/1866739.1866758
Dwork C. (2011b). The promise of differential privacy: a tutorial on algorithmic techniques. Proc of the 52nd Annual IEEE Symposium on Foundations of Computer Science, USA, pp. 1-2. https://doi.org/10.1109/FOCS.2011.88
Dwork C., Roth A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, No. 3-4, pp. 211-407. https://doi.org/10.1561/0400000042
Fung B. C. M., Wang K., Chen R., Yu P. S. (2010). Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR), Vol. 42, No. 4, pp. 1-53. https://doi.org/10.1145/1749603.1749605
Hall R., Rinaldo A., Wasserman L. (2013). Differential privacy for functions and functional data. J. Mach. Learn. Res, Vol. 14, No. 1, pp. 703-727. https://doi.org/10.1109/MCS.2012.2225913
Kifer D., Machanavajjhala A. (2011). No free lunch in data privacy. SIGMOD '11 Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, Athens, Greece, pp. 193-204. https://doi.org/10.1145/1989323.1989345
Kifer D., Machanavajjhala A. (2014). Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems, Vol. 39, No. 1, pp. 1-36. https://doi.org/10.1145/2514689
Kifer D., Smith A. D., Thakurta A. (2012). Private convex optimization for empirical risk minimization with applications to high-dimensional regression. In COLT, Edinburgh, United Kingdom Duration, pp.1-40. https://doi.org/10.1109/FOCS.2014.56
Koufogiannis F., Han S., Pappas G. J. (2016). Gradual release of sensitive data under differential privacy. Privacy and Confidentiality, Vol. 7, No. 2, pp. 1-22. https://doi.org/10.29012/jpc.v7i2.649
Li X. G., Li H., Li F. H., Zhu H. (2018). A survey on differential privacy. Journal of Cyber Security, Vol. 3, No. 5, pp. 92-104. http://dx.doi.org/10.19363/J.cnki.cn10-1380/tn.09.08
Noman M., Chen R., Fung B. C. M., Yu S. (2011). Differentially private data release for data mining. Proceeding KDD '11 Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, pp. 493-501. https://doi.org/10.1145/2020408.2020487
Parra-Arnau J., Perego A., Ferrari E., Forne J., Rebollo-Monedero D. (2013). Privacy-preserving enhanced collaborative tagging. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 1, pp. 180-193. https://doi.org/10.1109/tkde.2012.248
Wang Y., Wang Y., Singh A. (2016). A theoretical analysis of noisy sparse subspace clustering on dimensionality-reduced data. CoRR, eprint arXiv, Vol. 1610, No. 07650, pp. 99. http://dx.doi.org/10.1109/TIT.2018.2879912
Wong R. C. W., Fu A. W., Wang K., Xu Y., Yu P. S. (2011). Can the utility of anonymized data be used for privacy breaches. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 5, No. 3, pp. 1-24. https://doi.org/10.1145/1993077.1993080
Xiao Q., Chen R., Tan K. (2014). Differentially private network data release via structural inference. Proceeding KDD '14 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, pp. 911-920. https://doi.org/10.1145/2623330.2623642