RLHF

Self-Correction

Supervision

AI Application

Meta-Generation

Scalable non-distillation methods (to improve the most frontier models / push forward the upper limit)

Reward Model / Verifier