Benchmarks MMLU-Redux GPQA MMLU-Pro
Default Setting 0-shot CoT JSON 0-shot CoT 5-shot CoT
Models Main (448) Diamond (198) Extended (546)
ArmoRM-Llama3-8B-v0.1 54.23 33.3 33.3 31.3
Meta-Llama-3-8B-Instruct 61.66
Meta-Llama-3.1-8B-Instruct 67.24 32.8 48.3
Meta-Llama-3-70B-Instruct 78.01
Meta-Llama-3.1-70B-Instruct 82.97 46.7 66.4