July 16, 2025 17:57

2 papers have been accepted at COLM 2025 (Conference on Language Modeling) (Oct. 7-10, 2025, Montreal, Canada). For more details, please refer to the link below.

[COLM 2025] https://colmweb.org/

  • Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human
    Feedback
    Johannes Ackermann (The University of Tokyo / RIKEN AIP)
    Takashi Ishida (RIKEN AIP / The University of Tokyo)
    Masashi Sugiyama (RIKEN AIP / The University of Tokyo)
  • When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A
    Study with Context-Free Grammars
    Rei Higuchi (The University of Tokyo / RIKEN AIP)
    Ryotaro Kawata (The University of Tokyo / RIKEN AIP)
    Naoki Nishikawa (The University of Tokyo / RIKEN AIP)
    Kazusato Oko (UC Berkeley / RIKEN AIP)
    Shoichiro Yamaguchi (Preferred Networks, Inc.)
    Sosuke Kobayashi (Preferred Networks, Inc. / Tohoku University)
    Seiya Tokui (Preferred Networks, Inc.)
    Kohei Hayashi (The University of Tokyo)
    Daisuke Okanohara (Preferred Networks, Inc.)
    Taiji Suzuki (The University of Tokyo / RIKEN AIP)

Related Laboratories

last updated on June 12, 2025 11:09Laboratory
last updated on June 12, 2025 11:07Laboratory