Unit LeaderJunya Honda
Visiting ScientistKazuaki Toyoura
We are studying algorithms for problems requiring sequential decision making. In the most situations of decision making we do not have enough data or knowledge on the target and we have to explore the best choice with trial and error. Such problems are formulated as bandit problems where an agent tries to maximize the cumulative reward or find the action with the maximum expectation based on information obtained only from actually chosen actions. We are establishing achievable limits on these problems and constructing algorithms to achieve these limits.