July 16, 2025 17:57
2 papers have been accepted at COLM 2025 (Conference on Language Modeling) (Oct. 7-10, 2025, Montreal, Canada). For more details, please refer to the link below.
[COLM 2025] https://colmweb.org/
- Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human
Feedback
Johannes Ackermann (The University of Tokyo / RIKEN AIP)
Takashi Ishida (RIKEN AIP / The University of Tokyo)
Masashi Sugiyama (RIKEN AIP / The University of Tokyo) - When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A
Study with Context-Free Grammars
Rei Higuchi (The University of Tokyo / RIKEN AIP)
Ryotaro Kawata (The University of Tokyo / RIKEN AIP)
Naoki Nishikawa (The University of Tokyo / RIKEN AIP)
Kazusato Oko (UC Berkeley / RIKEN AIP)
Shoichiro Yamaguchi (Preferred Networks, Inc.)
Sosuke Kobayashi (Preferred Networks, Inc. / Tohoku University)
Seiya Tokui (Preferred Networks, Inc.)
Kohei Hayashi (The University of Tokyo)
Daisuke Okanohara (Preferred Networks, Inc.)
Taiji Suzuki (The University of Tokyo / RIKEN AIP)
Related Laboratories
last updated on June 12, 2025 11:09Laboratory
last updated on June 12, 2025 11:07Laboratory