Benchmarks | MMLU-Redux | GPQA | MMLU-Pro | ||
---|---|---|---|---|---|
Default Setting | 0-shot CoT JSON | 0-shot CoT | 5-shot CoT | ||
Models | Main (448) | Diamond (198) | Extended (546) | ||
ArmoRM-Llama3-8B-v0.1 | 54.23 | 33.3 | 33.3 | 31.3 | |
Meta-Llama-3-8B-Instruct | 61.66 | ||||
Meta-Llama-3.1-8B-Instruct | 67.24 | 32.8 | 48.3 | ||
Meta-Llama-3-70B-Instruct | 78.01 | ||||
Meta-Llama-3.1-70B-Instruct | 82.97 | 46.7 | 66.4 |