Talk by Arun Verma (Indian Institute of Technology Bombay)

November 17, 2020 10:45

Abstract

Title: Sequential Decision Problems with Weak Feedback

Abstract: Many variants of sequential decision problems that are considered in the literature depend upon the type of feedback and the amount of information they reveal about the associated rewards. Most of the prior work studied the cases where feedback from actions reveals rewards associated with the actions. However, in many areas like crowd-sourcing, medical diagnosis, and adaptive resource allocation, feedback from actions may be weak, i.e., may not reveal any information about rewards at all. Without any information about rewards, it is not possible to learn which action is optimal. Clearly, learning an optimal action is only possible if the problem structure is such that an optimal action can be identified without explicitly knowing the rewards. Our goal is to study the class of problems where optimal action can be inferred without explicitly knowing the rewards. Specifically, we study Unsupervised Sequential Selection (USS), where rewards/losses for selected actions are never revealed, but the problem structure is amenable to identify the optimal actions. We also introduce a novel setup named Censored Semi-Bandits (CSB), where the reward observed from an action depends on the amount of resources allocated to it. We develop provably optimal algorithms for the USS and CSB problems and validate their empirical performance on different problem instances derived from synthetic and real datasets.

This talk is based on the following papers:
1. Arun Verma, Manjesh K. Hanawal, Csaba Szepesvari, and Venkatesh Saligrama, ‘Online Algorithm for Unsupervised Sensor Selection,’ AISTATS 2019.
2. Arun Verma, Manjesh K. Hanawal, and N. Hemachandra, ‘Thompson Sampling for Unsupervised Sequential Selection,’ ACML 2020.
3. Arun Verma, Manjesh K. Hanawal, Csaba Szepesvari, and Venkatesh Saligrama, ‘Online Algorithm for Unsupervised Sequential Selection with Contextual Information,’ NeurIPS 2020.
4. Arun Verma, Manjesh K. Hanawal, Arun Rajkumar, and Raman Sankaran, ‘Censored Semi- Bandits: A Framework for Resource Allocation with Censored Feedback,’ NeurIPS 2019.
5. Arun Verma and Manjesh K. Hanawal, ‘Stochastic Network Utility Maximization with Unknown Utility: Multi-Armed Bandits Approach,’ IEEE INFOCOM 2020.

More Information

Date	November 27, 2020 (Fri) 17:00 - 18:00
URL	https://c5dc59ed978213830355fc8978.doorkeeper.jp/events/114225

Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
		Link to the event page for the 1st	Link to the event page for the 2nd	Link to the event page for the 3rd	Link to the event page for the 4th	5th
6th	7th	8th	Link to the event page for the 9th	Link to the event page for the 10th	11th	12th
13th	14th	15th	Link to the event page for the 16th	17th	18th	19th
20th	21th	22th	23th	24th	25th	26th
27th	28th	29th	30th	31th

Center for Advanced Intelligence Project

Events