[The 81st TrustML Young Scientist Seminar] Talk by Sayak Ray Chowdhury (Microsoft Research, India) "Provably Robust DPO: Aligning Language Models with Noisy Feedback"

May 17, 2024 15:06

Abstract

Date and Time:
June 10, 2024: 3:00 pm – 4:00 pm (JST)
Venue: Online only

Title:
Provably Robust DPO: Aligning Language Models with Noisy Feedback

Speaker:
Sayak Ray Chowdhury (Microsoft Research, India)

Abstract:
Learning from preference-based feedback has recently gained traction as a promising approach to align language models with human interests. These aligned models demonstrate impressive capabilities across various tasks. However, noisy preference data can negatively impact alignment. Practitioners have recently proposed heuristics to mitigate the effect, but theoretical underpinnings of these methods have remained elusive. In this work, we aim to bridge this gap by introducing a general framework for policy optimization in the presence of random preference flips. We propose rDPO, a robust version of the popular direct preference optimization method, show that it is provably tolerant to noise, and characterize its sub-optimality gap as a function of noise rate, dimension of the policy parameter, and sample size. Experiments on two real datasets show that rDPO is robust to noise in preferences compared to vanilla DPO and heuristics proposed by practitioners.

This is a joint work with Anush Kini and Nagarajan Natarajan.

Bio:
Sayak Ray Chowdhury is a postdoctoral researcher at Microsoft Research, India. Prior to this he was a postdoctoral fellow at Boston University, USA. He obtained his PhD from the Dept of ECE, Indian Institute of Science, where he was a recipient of Google PhD fellowship. His research interests include reinforcement learning, Bayesian optimization, multi-armed bandits and differential privacy. Recently, he has been working towards mathematical and empirical understandings of language models. More details about his research can be found here.

More Information

Date	June 10, 2024 (Mon) 15:00 - 16:00
URL	https://c5dc59ed978213830355fc8978.doorkeeper.jp/events/173590

Related Laboratories

last updated on June 12, 2025 11:09Laboratory

Imperfect Information Learning Team

Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
		Link to the event page for the 1st	Link to the event page for the 2nd	Link to the event page for the 3rd	Link to the event page for the 4th	5th
6th	7th	8th	Link to the event page for the 9th	Link to the event page for the 10th	11th	12th
13th	14th	15th	Link to the event page for the 16th	17th	18th	19th
20th	21th	22th	23th	24th	25th	26th
27th	28th	29th	30th	31th

Center for Advanced Intelligence Project

Events