[Brief Survey] Process Supervision
[Presentation] Scalable Oversight & Weak-to-Strong Generalizaiton
[Presentation] Data-Constrained Scaling Laws
Scaling up k
in Pass@k on MATH500 sees ex-high pass rate
“Ex-hard” math problems that LLMs fail to solve with up to thousands of trials
Evaluate reward models on multi-choice benchmarks