2025/7/16 17:57

2025年10月7日〜10日にカナダ・モントリオールで開催される国際会議「COLM2025(Conference on Language Modeling)」において、AIPセンターから下記の通り2本の論文が採択されました。

[COLM 2025] https://colmweb.org/

  • Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human
    Feedback
    Johannes Ackermann (The University of Tokyo / RIKEN AIP)
    Takashi Ishida (RIKEN AIP / The University of Tokyo)
    Masashi Sugiyama (RIKEN AIP / The University of Tokyo)
  • When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A
    Study with Context-Free Grammars
    Rei Higuchi (The University of Tokyo / RIKEN AIP)
    Ryotaro Kawata (The University of Tokyo / RIKEN AIP)
    Naoki Nishikawa (The University of Tokyo / RIKEN AIP)
    Kazusato Oko (UC Berkeley / RIKEN AIP)
    Shoichiro Yamaguchi (Preferred Networks, Inc.)
    Sosuke Kobayashi (Preferred Networks, Inc. / Tohoku University)
    Seiya Tokui (Preferred Networks, Inc.)
    Kohei Hayashi (The University of Tokyo)
    Daisuke Okanohara (Preferred Networks, Inc.)
    Taiji Suzuki (The University of Tokyo / RIKEN AIP)

関連研究室

last updated on 2025/7/18 11:05研究室
last updated on 2025/7/18 11:03研究室