P4P: A Practical Framework for Privacy-Preserving Distributed Computation

Project Overview

Modern networked applications and services, such as e-voting, e-commerce, e-bidding, data mining etc., often call for collaborative computation on shared user data. Privacy concerns have become a major obstacle hindering the acceptance/success of such applications. Existing solutions suffer from prohibitive cost and are not practical. P4P (Peers for Privacy) is a framework for carrying out such computations with adequate performance on today's commodity hardware at reasonable, realistically large scale. P4P leverages the natural incentives on the user side. Most users want better privacy, but don't understand the risks and generally are not willing to pay much for better protection. On the other hand, user communities always contain a fraction of altruistic users who provide resources to the rest of the community - this is how most peer-to-peer computing actually works. Our approach uses resources from a few of these users which allows a very simple, efficient solution that places little or no additional demands on either the server or the other users.

The P4P framework features a unique hybrid architecture combining P2P and existing client-server paradigm, and takes advantage of players' heterogeneity that exists in real world systems. The bulk of the computation and storage are still based on the server, but a (small) subset of users, denoted Privacy Peers, participate in the computation, removing data control from the server. The privacy peers do not gain control of the data, and need not be trusted - they only provide compute cycles. This arrangement allows us to take advantage of the different resources and protections offered by different players. In terms of security and privacy, P4P relies on the server, who is typically protected behind firewalls and maintained by professional administrators, for defending against outside attacks and uses the privacy peers, who are mostly client machines that are maintained by regular users, to protect user privacy against a curious server. Performance-wise, this architecture allows us to use a verifiable secret sharing (VSS) paradigm for computation. The advantage of a VSS private computation scheme is that it works with any sized field. And arithmetic operations with small field elements are extremely efficient, when an element fits in a single memory cell (in contrast, arithmetic operations with large field numbers, as is required by almost all existing MPC protocols and ZKPs, are typically 10^6 times slower). This means private computation in P4P is almost as efficient as the centralized implementation if the computations are composed mostly of additions. In a sense, for these applications, P4P can provide privacy almost for free (in terms of server's computation overhead). Addition-based algorithms are more general than would appear, and include non-linear gradient approaches such as Singular-Value Decomposition and many data mining algorithms based on EM (Expectation Maximization), etc. Thus P4P provides efficient solutions for a large number of real world applications.

The P4P framework also supports a very efficient user data validation protocol that verifies, in zero-knowledge, that the L2 norm of the vector a user inputs into the computation is bounded by a predefined limit L. This prevents a malicious user from exerting too much influence on the computation, which is a serious threat in any realistic application but generally lacks scalable solutions. The protocol uses only a logarithmic number of large field (1024 bits or more) operations. In experiments with our implementation, we show that verification of a million-element vector (e.g. during an SVD calculation) takes a few seconds of server or client time on commodity PCs as shown by the following figure (in contrast, using standard techniques takes hours). Overall, the protocol is dominated by the (linear) time to do small field operations that one has to pay even when the computation is done directly on user data without privacy. This makes privacy protection almost free from the vendor's point of view, which is essential for wide adoption. In addition, P4P can also deal with malicious privacy peers who participate in the computation. However, this is done without resorting to expensive ZKPs or homomorphism. Instead, we introduce a new VSS that takes advantage of the existence of honest majority among the players and relies on consensus for identifying correct behavior. The resulting scheme preserves the feature of "keeping the number of large integer operations small".

Figure 1: (a) Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length.

P4P also provides a set of tools that can be used to compose diverse applications. Such tools include data protection scheme, secure multicast, etc. All together, P4P provides a comprehensive and realistic solution for preserving privacy in real-world applications.

The basic P4P tools are being actively developed. Most components are ready. If you are interested in testing out the code, please send me an email: duan AT cs DOT berkeley DOT edu.

People

Yitao Duan

John Canny

Talks

SDM 2008 (4/24/2008) (.ppt)
Dissertation Talk(11/27/2007) (.ppt)
PODC 2007 (8/12/2007) (.ppt)
INFOCOM 2007 (5/09/2007) (.ppt)
CT-RSA 2006 (2/16/2006) (.ppt)

Publications

Yitao Duan and John Canny. Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining. To appear in SIAM International Conference on Data Mining (SDM08). April 24-26, 2008. Atlanta, Georgia, USA. (.pdf)
Yitao Duan and John F. Canny. Brief Announcement: Practical Private Computation of Vector Addition-Based Functions. PODC 2007: 326-327, August 12-15, Portland, OR, USA. Full version appears in SDM 08 and can be found here.
Yitao Duan and John Canny. Scalable Secure Bidirectional Group Communication. INFOCOM 2007, 6-12 May 2007, Anchorage, Alaska, USA.(.pdf)
Yitao Duan, John Canny and Justin Zhan. Efficient Privacy-Preserving Association Rule Mining: P4P Style. In IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2007), April 1-5, 2007, Honolulu, Hawaii.
Yitao Duan and John Canny. From Commodity to Value: A Privacy-Preserving e-Business Architecture. In 2006 IEEE International Conference on e-Business Engineering (ICEBE 2006), Oct. 24 - 26, Shanghai, China. (.pdf. Full version here)
Yitao Duan and John Canny. Zero-knowledge Test of Vector Equivalence and Granulation of User Data with Privacy. In 2006 IEEE International Conference on Granular Computing (GrC 2006), May 10 - 12, Atlanta, USA. (.pdf)
Yitao Duan and John Canny. How to Construct Multicast Cryptosystems Provably Secure against Adaptive Chosen Ciphertext Attack. In CT-RSA 2006. February 13 - 17, 2006, McEnery Convention Center, San Jose, USA. (.pdf) Presentation slides can be found here.
Yitao Duan, Jingtao Wang, Matthew Kam, John Canny. Privacy Preserving Link Analysis on Dynamic Weighted Graph, Computational & Mathematical Organization Theory, Volume 11, Issue 2, Jul 2005, Pages 141 - 159.
Yitao Duan, Jingtao Wang, Matthew Kam and John Canny. A Secure Online Algorithm for Link Analysis on Weighted Graph, SIAM Workshop on Link Analysis, Counterterrorism and Security, (at the SIAM Int. Conf. on Data Mining), Sutton Place Hotel, Newport Beach, California, USA, 23rd April, 2005. (.pdf)
Yitao Duan and John Canny. Protecting User Data in Ubiquitous Computing: Towards Trustworthy Environments. Privacy-Enhancing Technologies (PET) 2004, Toronto, CA, May 2004. (.pdf)
Ben Y. Zhao, Yitao Duan, Ling Huang, Anthony D. Joseph, and John D. Kubiatowicz, Brocade: Landmark Routing on Overlay Networks, IPTPS'02. (.pdf)

last updated 01/04/2008