lckr's picture
Add community evaluation results for GPQA, MMLU-PRO, SWE-BENCH_VERIFIED (#4)
377bae8
raw
history blame contribute delete
165 Bytes
- dataset:
id: Idavidrein/gpqa
task_id: diamond
value: 76.3
source:
url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
name: Model Card