Secure communication, namely allowing Alice and Bob to exchange messages securely, over an untrusted, asynchronous communication channel, is perhaps the quintessential cryptographic problem. Since the early days of the internet, cryptographers have been modeling security concerns for its emerging communication patterns. Most recently, securing communication over instant messaging applications has
posed a new and very different set of challenges than the ones seen before for protocols like IPSec, TLS, and PGP. Built on top of predecessors like Off-The-Record, the Signal messaging protocol has been designed
to give a response to these specific challenges of secure messaging, and in doing so it has revolutionised the concept of secure communication over the Internet in many ways. (This protocol is currently used to
transmit hundreds of billions of messages per day.)
It is well documented that standalone security analyses of protocols are not always sufficient to capture the security of the protocol when used as a component within a larger system. This situation is particularly
relevant to secure messaging and the Signal protocol since people typically participate concurrently in several conversations spanning several multi-platform chat services and the subtleties between a chat service and the underlying messaging protocol have led to network and systems security issues. However, Signal’s process of updating the shared keys crucially depends on feedback from the “downstream” authenticated encryption module which creates a seemingly inherent circularity between its key exchange and authenticated encryption modules making modular and composable analyses tricky.
In joint work with Mayank Varia, Ran Canetti, and Marika Swanberg, I am the first to model the overall security guarantees of the Signal architecture, as well as those of each of its individual components
in a universally composable framework. In our work (1) We provide a security model for end-to-end secure messaging against an adversarial network and adaptive corruptions. (2) We argue the security of the overall Signal architecture with respect to this model. (3) We model and argue the security of each individual component of the Signal architecture in a composable framework. Our modular analysis, together with the composability guarantees afforded by framework, will hopefully facilitate the use of both Signal-as-a-whole and its individual components within future cryptographic applications.
THE CONTINUAL RELEASE MODEL OF DIFFERENTIAL PRIVACY
In fields ranging from healthcare to criminal justice, sensitive data is being analysed to identify patterns and draw population-level conclusions. Differentially private (DP) data analysis was introduced by Dwork et al in 2006. It uses the injection of carefully calibrated noise to enable accurate computation of statistical information about a dataset while formally bounding how much one may learn about individual data points. It has been extensively studied and DP algorithms have been deployed in both industry and government. Notably, it was deployed by the US census bureau in 2020 for its decennial census. My work studies the theoretical and conceptual limitations of differential privacy for specific settings; one such setting being the continual release model of differential privacy.
Current government deployments of differential privacy, notably at the US Census Bureau, operate in the batch model : that is, they collect their input all at once and produce a single output. However, in many situations, the data are collected over time, and the published statistics need to be updated regularly eg: the number of COVID-19 cases. To investigate differential privacy in these situations, Dwork et al. and Chan et al. introduced the continual release model. In this model, a mechanism receives a sensitive dataset as a stream of T input records and produces, after receiving each record, an accurate output on the obtained inputs. Intuitively, the mechanism is differentially private if releasing the entire vector of T outputs satisfies differential privacy. The main challenge for privacy in this setting is that each individual record contributes to outputs at multiple time steps.
Together with Adam Smith, Sofya Rashkodnikova, and Satchit Sivakumar, I study the price differentially private (DP) algorithms must pay in accuracy to solve a problem in the continual release model instead of the batch model, providing the first strong lower-bounds. The paper considers two fundamental problems that are widely studied in the batch model and shows a gap in accuracy that is exponentially larger than what was previously known for the related problem of summation. Our lower bounds assume only that privacy holds for streams fixed in advance (the “nonadaptive” setting). We also formulate a model that allows for adaptively selected inputs. It captures dependencies that arise in many applications of continual release. In general, both privacy and accuracy are harder to attain in this model. Nevertheless, we analyse several algorithms in the new model and, in particular, show that our lower bounds are matched by the error of simple algorithms whose privacy holds even
for adaptively selected streams. That is, for the problems we consider, there is no overhead in terms of accuracy when the input stream is selected adaptively.
MEASURING POTENTIAL IMPACTS OF DISCLOSURE AVOIDANCE SYSTEMS
My colleagues and I were recently awarded a seed-grant from the Boston University Center for Anti-racist Research to study impact of differential privacy in the census on marginalised populations. As a team of computer scientists, we do not aim to directly reason about social impact. Instead, we build on the literature on multi-calibration to develop a concrete framework for understanding the privacy-utility trade-offs of different privacy-preserving systems for large data releases.
While differential privacy is mathematically elegant and computationally efficient, it can also be difficult to understand. The parameters of differential privacy are difficult to explain to a non-technical audience and in particular to data stakeholders. Our framework is intended to combat this difficulty by enabling census stakeholders and social science experts to better understand and contrast the potential privacy impacts of a differentially private data release with that of other data-release methods.
CONTEXTUALISING DIFFERENTIAL PRIVACY
Contextual integrity, introduced by Helen Nissenbaum in 2004, allows us reason about the privacy violations within a system by evaluating the flow of information within the system. By contextualising the information flow in a privacy preserving system, this privacy notion bridges ethical, legal, and policy approaches with scientific and technical ones – providing guidance on whether a system is privacy preserving in a normative sense.
It seems natural then to combine this framework with that of differential privacy, but work that combines these two notions remains scarce. Perhaps one reason for this is that the framework of contextual integrity models all information flow as binary, crucially leaving out the kind of data flow in differentially private systems.
My collaboration with contextual integrity researchers Helen Nissenbaum and Ero Balsa contributes to close this gap. We ask whether and how CI may guide us in identifying settings that call for the use of DP, as well as scenarios where using DP may be misguided with respect to the expectations of privacy within a system. As a sneak peek — we revisit the popular claim “statistical inference is not a privacy violation”, showing how a CI-analysis may suggest otherwise.