2022/6/21 19:02

要旨

Speaker: Dorian Baudry (CNRS/INRIA)

Title: Optimal Thompson Sampling Strategies for Support-Aware CVaR Bandits

Abstract:
In this presentation we will introduce a multi-arm bandit algorithm proposed in Baudry et al.
(2021). A multi-arm bandit is a sequential decision-making problem in which at different time steps
a learner: (1) selects an action, (2) observes a reward corresponding to this action, and (3)
updates her policy to choose future actions in order to maximize the expected sum of rewards. The
main difficulty is then to find a strategy with the right balance between exploration and
exploitation. Motivated by an application of bandits in agriculture, we consider a risk-aware
variant of this problem in which the quality of each action is evaluated by its Conditional Value at
Risk (CVaR) at some given quantile of the reward distribution. After describing the problem and
illustrating the potential applications in agriculture in the first part of the talk, we will
introduce the Bounded CVaR Thompson Sampling algorithm (B-CVTS), that we prove to be the first
asymptotically optimal algorithm for CVaR bandits for distributions with bounded support. We will
then showcase the main theorems and elements of analysis presented in the paper. Finally, we will
discuss the experiments we implemented using the Decision Support Systems for Agro-Technological
Transfer (DSSAT), illustrating empirically the benefit of Thompson Sampling approaches in a
realistic environment simulating a use-case in agriculture.
Link to the article: https://proceedings.mlr.press/v139/baudry21a.html

詳細情報

日時 2022/07/04(月) 15:00 - 16:00
URL https://c5dc59ed978213830355fc8978.doorkeeper.jp/events/138914

関連研究室

last updated on 2023/6/26 10:54研究室