RLHF
Self-Correction
Supervision
AI Application
Meta-Generation
Scalable non-distillation methods (to improve the most frontier models / push forward the upper limit)
Reward Model / Verifier