The Data Minimization Principle in Machine Learning

Описание к видео The Data Minimization Principle in Machine Learning

A Google TechTalk, presented by Ferdinando Fioretto, 2024-04-10
ABSTRACT: The principle of data minimization aims to reduce the amount of data collected and retained to minimize the potential for misuse, unauthorized access, or data breaches. While endorsed by various global data protection regulations, its practical implementation in machine learning remains elusive due to the lack of a clear formulation.

We begin the talk by reviewing the principle of data minimization as presented in several data protection regulations and examining the challenges in formalizing this principle for machine learning tasks. We then propose an optimization-based formalization that attempts to closely follow the legal language of this principle. However, our empirical analysis reveals a potentially overlooked gap between the privacy expectations and actual benefits of data minimization, highlighting the need for approaches that address privacy in a more holistic framework.

Next, we shift gears and discuss the application of data minimization in inference tasks. In high-stakes domains such as law, recruitment, and healthcare, learning models frequently rely on sensitive user data for inference, necessitating the complete set of features. This not only poses significant privacy risks for individuals but also demands substantial human effort from organizations to verify information accuracy. We ask whether it is necessary to require all input features for a model to produce accurate or nearly accurate predictions during inference. We present a sequential algorithm to identify the minimal set of attributes that each individual should reveal, and an empirical assessment showing that individuals often need to disclose only a very small subset of their features without compromising decision-making accuracy.

Finally, I will conclude with a call for action and collaboration, seeking additional efforts in formalizing privacy legal principles in a way that they are actionable and deployable.

Speaker: Ferdinando Fioretto (University of Virginia)

Комментарии

Информация по комментариям в разработке