logo资料库

Differential Privacy and Applications 差分隐私综述.pdf

第1页 / 共243页
第2页 / 共243页
第3页 / 共243页
第4页 / 共243页
第5页 / 共243页
第6页 / 共243页
第7页 / 共243页
第8页 / 共243页
资料共243页,剩余部分请下载后查看
Preface
Acknowledgments
Contents
1 Introduction
1.1 Privacy Preserving Data Publishing and Analysis
1.2 Privacy Violations
1.3 Privacy Models
1.4 Differential Privacy
1.5 Outline and Book Overview
2 Preliminary of Differential Privacy
2.1 Notations
2.2 Differential Privacy Definition
2.2.1 The Privacy Budget
2.3 The Sensitivity
2.3.1 The Global Sensitivity
2.3.2 The Local Sensitivity
2.4 The Principle Differential Privacy Mechanisms
2.4.1 The Laplace Mechanism
2.4.1.1 The Gaussian Mechanism
2.4.2 The Exponential Mechanism
2.4.2.1 Mechanism Example
2.5 Utility Measurement of Differential Privacy
3 Differentially Private Data Publishing: Settings and Mechanisms
3.1 Interactive and Non-interactive Settings
3.2 Publishing Mechanism
4 Differentially Private Data Publishing: Interactive Setting
4.1 Transaction Data Publishing
4.1.1 Laplace
4.1.2 Transformation
4.1.3 Query Separation
4.1.4 Iteration
4.1.5 Discussion
4.2 Histogram Publishing
4.2.1 Laplace
4.2.2 Partition of Dataset
4.2.3 Consistency of Histogram
4.3 Stream Data Publishing
4.3.1 Laplace
4.3.2 Partition of Dataset
4.3.3 Iteration
4.3.4 Discussion
4.4 Graph Data Publishing
4.4.1 Edge Differential Privacy
4.4.2 Node Differential Privacy
4.4.3 Discussion
4.5 Summary on Interactive Setting
5 Differentially Private Data Publishing: Non-interactive Setting
5.1 Batch Queries Publishing
5.1.1 Laplace
5.1.2 Transformation
5.1.3 Partition of Dataset
5.1.4 Iteration
5.1.5 Discussion
5.2 Contingency Table Publishing
5.2.1 Laplace
5.2.2 Iteration
5.2.3 Transformation
5.3 Anonymized Dataset Publishing
5.4 Synthetic Dataset Publishing
5.4.1 Synthetic Dataset Publishing Basedon Learning Theory
5.4.1.1 Learning Theory in Differential Privacy
5.4.1.2 Synthetic Publishing
5.4.2 High Dimensional Synthetic Dataset Publishing
5.5 Summary on Non-interactive Setting
6 Differentially Private Data Analysis
6.1 Laplace/Exponential Framework
6.1.1 SuLQ and PINQ Interface
6.1.1.1 SuLQ
6.1.1.2 PINQ
6.1.2 Specific Algorithms in the Laplace/Exponential Framework
6.1.2.1 Supervised Learning
6.1.2.2 Unsupervised Learning
6.1.2.3 Frequent Itemset Mining
6.1.3 Summary on Laplace/Exponential Framework
6.2 Private Learning Framework
6.2.1 Foundation of ERM
6.2.2 Private Learning in ERM
6.2.2.1 Output Perturbation
6.2.2.2 Objective Perturbation
6.2.2.3 Risk Bound in Different Learning Algorithms
6.2.2.4 Discussion
6.2.3 Sample Complexity Analysis
6.2.3.1 Relaxing Privacy Requirement
6.2.3.2 Relaxing Hypothesis
6.2.3.3 Semi-Supervised Learning
6.2.3.4 Discussion
6.2.4 Summary on Private Learning Framework
6.3 Summary of Differentially Private Data Analysis
7 Differentially Private Deep Learning
7.1 Introduction
7.2 Preliminary
7.2.1 Deep Learning Structure
7.2.2 Stochastic Gradient Descent
7.2.2.1 Deep Auto-Encoder
7.3 Differentially Private Deep Learning
7.3.1 Basic Laplace Method
7.3.2 Private SGD Method
7.3.2.1 Norm Clipping
7.3.2.2 Grouping Batches
7.3.2.3 Privacy Composition
7.3.3 Deep Private Auto-Encoder Method
7.3.3.1 Deep Private Auto-Encoder Algorithm
7.3.3.2 Functional Mechanism
7.3.3.3 Sensitivity Measurements
7.3.4 Distributed Private SGD
7.3.4.1 Sparse Vector Technique
7.4 Experimental Methods
7.4.1 Benchmark Datasets
7.4.2 Learning Objectives
7.4.3 Computing Frameworks
7.5 Summary
8 Differentially Private Applications: Where to Start?
8.1 Solving a Privacy Problem in an Application
8.2 Challenges in Differentially Private Applications
8.2.1 High Sensitivity Challenge
8.2.2 Dataset Sparsity Challenge
8.2.3 Large Query Set Challenge
8.2.4 Correlated Data Challenge
8.2.5 Computational Complexity Challenge
8.2.6 Summary
8.3 Useful Public Datasets in Applications
8.3.1 Recommender System Datasets
8.3.2 Online Social Network Datasets
8.3.3 Location Based Datasets
8.3.4 Other Datasets
8.4 Applications Settings
9 Differentially Private Social Network Data Publishing
9.1 Introduction
9.2 Preliminaries
9.3 Basic Differentially Private Social Network Data Publishing Methods
9.3.1 Node Differential Privacy
9.3.1.1 Truncation and Smooth Sensitivity
9.3.1.2 Lipschitz Extension
9.3.1.3 Iterative Based Mechanism
9.3.2 Edge Differential Privacy
9.4 Graph Update Method
9.4.1 Overview of Graph Update
9.4.2 Graph Update Method
9.4.3 Update Function
9.4.4 Privacy and Utility Analysis
9.4.4.1 Privacy Analysis
9.4.4.2 Utility Analysis
9.4.5 Experimental Evaluation
9.4.5.1 Datasets and Configuration
9.4.5.2 Performance Evaluation on Diverse Size of Query Sets
9.5 Summary
10 Differentially Private Recommender System
10.1 Introduction
10.2 Preliminaries
10.2.1 Collaborative Filtering
10.2.2 Neighborhood-Based Methods: k Nearest Neighbors
10.2.3 Model-Based Methods: Matrix Factorization
10.3 Basic Differentially Private Recommender Systems
10.3.1 Differentially Private Untrustworthy Recommender System
10.3.2 Differentially Private TrustworthyRecommender System
10.3.2.1 Matrix Factorization with Private Input Perturbation
10.3.2.2 Private Stochastic Gradient Perturbation
10.3.2.3 ALS with Output Perturbation
10.4 Private Neighborhood-Based Collaborative Filtering Method
10.4.1 KNN Attack to Collaborative Filtering
10.4.2 The Private Neighbor Collaborative FilteringAlgorithm
10.4.2.1 The Private Neighbor Selection
10.4.2.2 Recommendation-Aware Sensitivity
10.4.2.3 Private Neighbor Selection Implementation
10.4.3 Privacy and Utility Analysis
10.4.3.1 Utility Analysis
10.4.3.2 Privacy Analysis
10.4.4 Experiment Analysis
10.4.4.1 Datasets and Measurements
10.4.4.2 Performance of PNCF
10.5 Summary
11 Privacy Preserving for Tagging Recommender Systems
11.1 Introduction
11.2 Preliminaries
11.2.1 Notations
11.2.2 Tagging Recommender Systems
11.2.3 Related Work
11.3 Private Tagging Publishing Method
11.3.1 User Profiles
11.3.2 Private Tagging Release Algorithm Overview
11.3.3 Private Topic Model Generation
11.3.3.1 LDA Model Construction
11.3.3.2 Private Model Generation
11.3.3.3 Topic-Based Profile Generation
11.3.4 Topic Weight Perturbation
11.3.5 Private Tag Selection
11.3.6 Privacy and Utility Analysis
11.3.6.1 Privacy Analysis
11.3.6.2 Utility Analysis
11.3.7 Experimental Evaluation
11.3.7.1 Datasets
11.3.7.2 Performance of Tagging Recommendation
11.4 Summary
12 Differentially Location Privacy
12.1 Introduction
12.2 Preliminary
12.3 Basic Location Privacy Methods
12.3.1 Snapshot Location Privacy: Geo-Indistinguishability
12.3.1.1 Probabilistic Model
12.3.1.2 Geo-Indistinguishability Definition
12.3.1.3 Geo-Indistinguishability Method
12.3.2 Trajectory Privacy
12.4 Hierarchical Snapshot Location Publishing
12.4.1 Hierarchical Sensitivity
12.4.2 Overview of Private Location Release
12.4.3 Private Location Release Algorithm
12.4.3.1 Private Location Clustering
12.4.3.2 Cluster Weight Perturbation
12.4.3.3 Private Location Selection
12.4.4 Utility and Privacy
12.4.4.1 Utility Analysis
12.4.4.2 Privacy Analysis
12.4.5 Experimental Evaluation
12.4.5.1 Datasets
12.4.5.2 Estimation of Distance Error
12.5 Summary
13 Differentially Private Spatial Crowdsourcing
13.1 Introduction
13.2 Basic Method
13.2.1 Background of Crowdsourcing
13.2.2 Differentially Private Crowdsourcing Methods
13.3 Differential Privacy in Reward-Based Crowdsourcing
13.3.1 Problem Statement
13.3.2 Building a Contour Plot with DP Guarantee
13.3.3 Task Assignment
13.3.3.1 Modeling Acceptance Probability and ASP
13.3.3.2 Optimized Strategy and Radius Estimation
13.3.4 Experimental Evaluation
13.3.4.1 Performance on DE
13.3.4.2 Performance on RejR
13.3.4.3 Performance on EC
13.4 Summary
14 Correlated Differential Privacy for Non-IID Datasets
14.1 Introduction
14.2 An Example: Correlated Records in a Dataset
14.3 Basic Methods
14.3.1 Pufferfish
14.3.2 Blowfish
14.4 Correlated Differential Privacy
14.4.1 Correlated Differential Privacy Problem
14.4.2 Research Issues and Challenges
14.4.3 Correlated Dataset Analysis
14.4.4 Correlated Sensitivity
14.4.5 Correlated Iteration Mechanism
14.4.5.1 Overview of Correlated Iteration Mechanism
14.4.5.2 Correlated Update Function
14.4.5.3 Parameters Discussion
14.4.6 Mechanism Analysis
14.4.6.1 Privacy Analysis
14.4.6.2 Utility Analysis
14.4.7 Experiment and Analysis
14.4.7.1 Datasets and Configuration
14.4.7.2 The Performance of CIM
14.4.7.3 Correlated Sensitivity vs. Global Sensitivity
14.5 Summary
15 Future Directions and Conclusion
15.1 Adaptive Data Analysis: Generalization in Machine Learning
15.2 Personalized Privacy
15.3 Secure Multiparty Computations with Differential Privacy
15.4 Differential Privacy and Mechanism Design
15.5 Differential Privacy in Genetic Data
15.6 Local Differential Privacy
15.7 Learning Model Publishing
15.8 Conclusion
References
Index
Advances in Information Security 69 Tianqing Zhu Gang Li Wanlei Zhou Philip S. Yu Differential Privacy and Applications
Advances in Information Security Volume 69 Series editor Sushil Jajodia, George Mason University, Fairfax, VA, USA
More information about this series at http://www.springer.com/series/5576
Tianqing Zhu • Gang Li Wanlei Zhou Philip S. Yu Differential Privacy and Applications 123
Tianqing Zhu Deakin University Melbourne, Australia Wanlei Zhou Deakin University Melbourne, Australia Gang Li Deakin University Melbourne, Australia Philip S. Yu University of Illinois at Chicago Chicago, IL, USA ISSN 1568-2633 Advances in Information Security ISBN 978-3-319-62002-2 DOI 10.1007/978-3-319-62004-6 ISBN 978-3-319-62004-6 (eBook) Library of Congress Control Number: 2017946488 © Springer International Publishing AG 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface Corporations, organizations, and governments have collected, digitized, and stored information in digital forms since the invention of computers, and the speed of such data collection and volumes of stored data have increased dramatically over the past few years, thanks to the pervasiveness of computing devices and the associated applications that are closely linked to our daily lives. For example, hospitals collect records of patients, search engines collect online behaviors of users, social network sites collect connected friends of people, e-commerce sites collect shopping habits of customers, and toll road authorities collect travel details of vehicles. Such huge amounts of datasets provide excellent opportunities for businesses and governments to improve their services and to bring economic and social benefits, especially through the use of technologies dealing with big data, including data mining, machine learning, artificial intelligence, data visualization, and data analytics. For example, by releasing some statistics of hospital records may help medical research to combat diseases. However, as most of the collected datasets are personally related and contain private or sensitive information, data releasing may also provide a fertile ground for adversaries to obtain certain private and sensitive information, even though simple anonymization techniques are used to hide such information. Privacy preserving has, therefore, become an urgent issue that needs to be addressed in the digital age. Differential privacy is a new and promising privacy framework and has become a popular research topic in both academia and industry. It is one of the most prevalent privacy models as it provides a rigorous and provable privacy notion that can be applied in various research areas, and can be potentially implemented in various application scenarios. The goal of this book is to summarize and analyze the state-of-the-art research and investigations in the field of differential privacy and its applications in privacy-preserving data publishing and releasing, so as to provide an approachable strategy for researchers and engineers to implement this new framework in real world applications. This is the first book with a balanced view on differential privacy theory and its applications, as most existing books related to privacy preserving either do not v
vi Preface touch the topic of differential privacy or only focus on the theoretical analysis of differential privacy. Instead of using abstract and complex notions to describe the concepts, methods, algorithms, and analysis on differential privacy, this book presents these difficult topics in a combination of applications, in order to help students, researchers, and engineers with less mathematical background understand the new concepts and framework, enabling a wider adoption and implementation of differential privacy in the real world. The striking features of the book, differs from others, can be illustrated from three basic aspects: A detailed coverage on differential privacy in the perspective of engineering, rather than computing theory. The most difficult part in comprehending differ- ential privacy is the complexity and the level of abstract of the theory. This book presents the theory of differential privacy in a more natural and easy to understand way. A rich set of state-of-the-art examples on various application areas helps readers to understand how to implement differential privacy in real world scenarios. Each application example includes a brief introduction to the problem and its challenges, a detailed implementation of differential privacy to solve the problem, and an analysis on the privacy and utility. A comprehensive collection of contemporary research results and issues in differential privacy. Apart from the basic theory, most of the contents of the book are from the recent publications in the last 5 years, reflecting the state-of-the-art of research and development in the area of differential privacy. This book intends to enable readers, especially postgraduate and senior under- graduate students, to study up-to-date concepts, methods, algorithms, and analytic skills for building modern privacy-preserving applications through differential privacy. It enables students not only to master the concepts and theories in relation to differential privacy but also to readily use the material introduced into implementa- tion practices. Therefore, the book is divided into two parts: theory and applications. In the theory part, after an introduction of the differential privacy preliminaries, the book presents detailed descriptions from an engineering viewpoint on areas of differentially private data publishing and differentially private data analysis where research on differential privacy has been conducted. In the applications part, after a summary on the steps to follow when solving the privacy-preserving problem in a particular application, the book then presents a number of state-of-the- art application areas of differential privacy, including differentially private social network data publishing, differentially private recommender system, differential location privacy, spatial crowdsourcing with differential privacy preservation, and correlated differential privacy for non-IID datasets. The book also includes a final chapter on the future direction of differential privacy and its applications.
Preface Acknowledgments vii We are grateful to many research students and colleagues at Deakin University in Melbourne and University of Illinois at Chicago, who have made a lot of comments to our presentations as their comments inspire us to write this book. We would like to acknowledge some support from research grants we have received, in particular, the Australian Research Council Grant no. DP1095498, LP120200266, and DP140103649, NSF through grants IIS-1526499, and CNS-1626432, and NSFC (Nos. 61672313, 61502362). Some interesting research results presented in the book are taken from our research papers that indeed (partially) supported through these grants. We also would like to express our appreciations to the editors in Springer, especially Susan Lagerstrom-Fife, for the excellent professional support. Finally we are grateful to the family of each of us for their consistent and persistent supports. Without their support, the book may just become some unpublished discussions. Melbourne, Australia Melbourne, Australia Melbourne, Australia Chicago, IL, USA May 2017 Tianqing Zhu Wanlei Zhou Gang Li Philip S. Yu
分享到:
收藏