2025/7/16 17:57
2025年10月7日〜10日にカナダ・モントリオールで開催される国際会議「COLM2025(Conference on Language Modeling)」において、AIPセンターから下記の通り2本の論文が採択されました。
[COLM 2025] https://colmweb.org/
- Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human
Feedback
Johannes Ackermann (The University of Tokyo / RIKEN AIP)
Takashi Ishida (RIKEN AIP / The University of Tokyo)
Masashi Sugiyama (RIKEN AIP / The University of Tokyo) - When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A
Study with Context-Free Grammars
Rei Higuchi (The University of Tokyo / RIKEN AIP)
Ryotaro Kawata (The University of Tokyo / RIKEN AIP)
Naoki Nishikawa (The University of Tokyo / RIKEN AIP)
Kazusato Oko (UC Berkeley / RIKEN AIP)
Shoichiro Yamaguchi (Preferred Networks, Inc.)
Sosuke Kobayashi (Preferred Networks, Inc. / Tohoku University)
Seiya Tokui (Preferred Networks, Inc.)
Kohei Hayashi (The University of Tokyo)
Daisuke Okanohara (Preferred Networks, Inc.)
Taiji Suzuki (The University of Tokyo / RIKEN AIP)
関連研究室
last updated on 2025/7/18 11:05研究室