P4P: A Practical Framework for Privacy-Preserving Distributed Computation
Project Overview
Modern networked applications and services, such as
e-voting, e-commerce, e-bidding, data mining etc., often call for collaborative
computation on shared user data. Privacy concerns have become a major obstacle
hindering the acceptance/success of such applications. Existing solutions
suffer from
prohibitive cost and are not practical. P4P (Peers for Privacy) is a framework
for carrying out such computations with adequate performance on today's commodity
hardware at reasonable, realistically large scale. P4P leverages
the natural incentives on the user side. Most users want better
privacy, but don't understand the risks and generally are not
willing to pay much for better protection. On the other hand, user
communities always contain a fraction of altruistic users who
provide resources to the rest of the community - this is how most
peer-to-peer computing actually works. Our approach uses resources
from a few of these users which allows a very simple, efficient
solution that places little or no additional demands on either the
server or the other users.
The P4P framework features a unique hybrid architecture combining
P2P and existing client-server paradigm, and takes advantage of
players' heterogeneity that exists in real world systems. The bulk
of the computation and storage are still based on the server, but
a (small) subset of users, denoted Privacy Peers, participate in
the computation, removing data control from the server. The
privacy peers do not gain control of the data, and need not be
trusted - they only provide compute cycles. This arrangement
allows us to take advantage of the different resources and
protections offered by different players. In terms of security and
privacy, P4P relies on the server, who is typically protected
behind firewalls and maintained by professional administrators,
for defending against outside attacks and uses the privacy peers,
who are mostly client machines that are maintained by regular
users, to protect user privacy against a curious server.
Performance-wise, this architecture allows us to use a verifiable
secret sharing (VSS) paradigm for computation. The advantage of a
VSS private computation scheme is that it works with any sized
field. And arithmetic operations with small field elements are
extremely efficient, when an element fits in a single memory cell
(in contrast, arithmetic operations with large field
numbers, as is required by almost all existing MPC protocols and
ZKPs, are typically 10^6 times slower). This means private
computation in P4P is almost as efficient as the centralized
implementation if the computations are composed mostly of
additions. In a sense, for these applications, P4P can provide
privacy almost for free (in terms of server's computation
overhead). Addition-based algorithms are more general than would
appear, and include non-linear gradient approaches such as
Singular-Value Decomposition and many data mining algorithms based
on EM (Expectation Maximization), etc. Thus P4P provides efficient
solutions for a large number of real world applications.
The P4P framework also supports a very efficient user data
validation protocol that verifies, in zero-knowledge, that the L2
norm of the vector a user inputs into the computation is bounded
by a predefined limit L. This prevents a malicious user from
exerting too much influence on the computation, which is a serious
threat in any realistic application but generally lacks scalable
solutions. The protocol uses only a logarithmic number of large
field (1024 bits or more) operations. In experiments
with our implementation, we show that verification of a
million-element vector (e.g. during an SVD calculation) takes a
few seconds of server or client time on commodity PCs as shown by the
following figure (in
contrast, using standard techniques takes hours). Overall, the
protocol is dominated by the (linear) time to do small field
operations that one has to pay even when the computation is done
directly on user data without privacy. This makes privacy
protection almost free from the vendor's point of view, which is
essential for wide adoption. In addition, P4P can also deal with
malicious privacy peers who participate in the computation.
However, this is done without resorting to expensive ZKPs or
homomorphism. Instead, we introduce a new VSS that takes advantage
of the existence of honest majority among the players and relies
on consensus for identifying correct behavior. The resulting
scheme preserves the feature of "keeping the number of large
integer operations small".
Figure 1: (a) Verifier and (b) prover times in seconds for the
validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits.
The x-axis is the vector length.
P4P also provides a set of tools that can be used to
compose diverse applications. Such tools include data protection scheme, secure
multicast, etc. All together, P4P provides
a comprehensive and realistic solution for preserving privacy in real-world
applications.
The basic P4P tools are being actively developed. Most components are ready. If you are
interested in testing out the code, please send me an email: duan AT cs DOT berkeley DOT edu.
People
Talks
- SDM 2008 (4/24/2008) (.ppt)
- Dissertation Talk(11/27/2007) (.ppt)
- PODC 2007 (8/12/2007) (.ppt)
- INFOCOM 2007 (5/09/2007) (.ppt)
- CT-RSA 2006 (2/16/2006) (.ppt)
Publications
- Yitao Duan and John Canny. Practical Private Computation and Zero-Knowledge Tools
for Privacy-Preserving Distributed Data Mining. To appear in SIAM International
Conference on Data Mining (SDM08). April 24-26, 2008. Atlanta, Georgia, USA.
(.pdf)
- Yitao Duan and John F. Canny. Brief Announcement: Practical Private Computation of Vector
Addition-Based Functions. PODC 2007: 326-327, August 12-15, Portland, OR, USA. Full version
appears in SDM 08 and can be found here.
- Yitao Duan and John Canny. Scalable Secure Bidirectional Group Communication.
INFOCOM 2007, 6-12 May 2007, Anchorage, Alaska, USA.(.pdf)
- Yitao Duan, John Canny and Justin Zhan. Efficient Privacy-Preserving Association Rule
Mining: P4P Style. In IEEE Symposium on Computational Intelligence and Data
Mining (CIDM 2007), April 1-5, 2007, Honolulu, Hawaii.
- Yitao Duan and John Canny. From Commodity to Value: A Privacy-Preserving e-Business
Architecture. In 2006 IEEE International Conference on e-Business Engineering
(ICEBE 2006), Oct. 24 - 26, Shanghai, China. (.pdf. Full
version here)
- Yitao Duan and John Canny. Zero-knowledge Test of Vector Equivalence and
Granulation of User Data with Privacy. In 2006 IEEE International
Conference on Granular Computing (GrC 2006), May 10 - 12, Atlanta, USA.
(.pdf)
- Yitao Duan and John Canny. How to Construct Multicast Cryptosystems
Provably Secure against Adaptive Chosen Ciphertext Attack. In CT-RSA 2006.
February 13 - 17, 2006, McEnery Convention Center, San Jose,
USA. (.pdf) Presentation slides can be found
here.
- Yitao Duan, Jingtao Wang, Matthew Kam, John Canny. Privacy Preserving Link Analysis
on Dynamic Weighted Graph, Computational & Mathematical Organization Theory,
Volume 11, Issue 2, Jul 2005, Pages 141 - 159.
- Yitao Duan, Jingtao Wang, Matthew Kam and John Canny. A Secure
Online Algorithm for Link Analysis on Weighted Graph, SIAM Workshop on
Link Analysis, Counterterrorism and Security, (at the SIAM Int. Conf. on
Data Mining), Sutton Place Hotel, Newport Beach, California, USA, 23rd April,
2005. (.pdf)
- Yitao Duan and John Canny. Protecting User Data in Ubiquitous Computing:
Towards Trustworthy Environments. Privacy-Enhancing Technologies (PET)
2004, Toronto, CA, May 2004. (.pdf)
- Ben Y. Zhao, Yitao Duan, Ling Huang, Anthony D. Joseph, and John D. Kubiatowicz,
Brocade: Landmark Routing on Overlay Networks, IPTPS'02. (.pdf)
last updated 01/04/2008