Publications


PREPRINTS

Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release [paper link]
Palak Jain, Mark Bun, Marco Carmosino, Gabriel Kaptchuk, Satchit Sivakumar.

[ Abstract ]

The technical literature about data privacy largely consists of two complementary approaches: formal definitions of conditions sufficient for privacy preservation and attacks that demonstrate privacy breaches. Differential privacy is an accepted standard in the former sphere. However, differential privacy’s powerful adversarial model and worst-case guarantees may make it too stringent in some situations, especially when achieving it comes at a significant cost to data utility. Meanwhile, privacy attacks aim to expose real and worrying privacy risks associated with existing data release processes but often face criticism for being unrealistic. Moreover, the literature on attacks generally does not identify what properties are necessary to defend against them.

We address the gap between these approaches by introducing demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy. This condition captures privacy violations arising from inferences about individuals that are incoherent with respect to the demographic patterns in the data. Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process. Thus, we capture privacy threats that exist even when no attack is explicitly being carried out. Our framework not only provides a condition with respect to which data release algorithms can be analysed but suggests natural experimental evaluation methodologies that could be used to build practical intuition and make tangible assessment of risks. Finally, we argue that demographic coherence is weaker than differential privacy: we prove that every differentially private (DP) data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.

Synopsis: Secure and private trend inference from encrypted semantic embeddings
Madelyne Xiao, Palak Jain, Micha Gorelick, Sarah Scheffler

[ Abstract ]

WhatsApp and many other commonly used communication platforms guarantee end-to-end encryption (E2EE), which requires that service providers lack the cryptographic keys to read communications on their own platforms. This privacy-preserving design makes it difficult to study important phenomena like the spread of misinformation or political messaging, as users have a clear expectation and desire for privacy and little incentive to forfeit that privacy in the process of handing over raw data to researchers or journalists.

We introduce Synopsis, a secure architecture for analyzing messaging trends in consensually-donated E2EE messages. Specifically designed for investigative workflows, Synopsis leverages techniques from cryptography, differential privacy, and NLP to give platform users the ability to donate their data for the public good without exposing the contents of their messages and journalists the ability to look for messaging-trends without ever accessing individual texts or being in a position to leak information about them.

To facilitate both exploratory and targeted analysis—a challenge for differentially private systems—Synopsis combines techniques used in the local and central models of differential privacy. Meanwhile, malicious-secure multi-party computation (MPC) ensures that the differentially private (DP) query architecture is the only way to access messages, preventing any party from directly viewing even message embeddings.

Evaluations on a dataset of Hindi-language WhatsApp messages (3,500 messages represented as 300-dimensional embeddings) demonstrate the efficiency and accuracy of our approach. Queries on this data run in about 3 seconds, and the accuracy of the fine-grained interface exceeds 94% on benchmark tasks for an epoch of one day.

Privacy’s Odd Couple: Privacy Law and Privacy Engineering on Inference and Information Recovery
Jeremy Seeman, Palak Jain, and Daniel Susser

This work appeared at PLSC 2024.


PUBLIC COMMENTS ON GOVERNMENT CALLS

Response to RFI on Executive Branch Agency Handling of Commercially Available Information Containing Personally Identifiable Information.
Rachel Cummings, Shlomi Hod, Palak Jain, Gabriel Kaptchuk, Tamalika Mukherjee, Priyanka Nanayakkara, and Jayshree Sarathy
[link to response]

[more info]

The Office of Management and Budget (OMB) issued this Request for Information (RFI) as part of the OMB’s implementation of the Biden Administration Executive Order 14110, “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.”


CONFERENCE PUBLICATIONS

This work also appeared at FORC ’24 and TPDP ’23.
[paper link]

NeurIPS
2023

Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation
Palak Jain, Iden Kalemaj, Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith.

This work also appeared at FORC ’24 and TPDP’23.
[paper link]

ICML
2023

The Price of Differential Privacy under Continual Observation
Palak Jain, Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith.

This work also appeared at FORC ’23 and TPDP ’22.
[paper link] [spotlight talk at ICML ’23]

CRYPTO
2022

Universally Composable End-to-End Secure Messaging
[paper link] [talk link]
Ran Canetti, Palak Jain, Marika Swanberg, Mayank Varia.