24GB Models
Collection
Models optimized for 24GB VRAM • 7 items • Updated
This is currently a STATIC quant, because the imatrix tool seems to be broken with Gemma 4 (>100 ppl). I will update with an imatrix once I can verify correctness.
5.05 bpw, a mixture of Q5_K and Q4_K
This is a vRAM hog that barely fits ~32k CTX on a 24GiB GPU. I'm not willing to go lower on quant and risk compromising capability, so I would instead recommend quantizing K/V or putting a couple layers in DRAM for long context agentic tasks. Otherwise I'd use https://huggingface.co/Beinsezii/gemma-4-26B-A4B-it-GGUF-6.52BPW instead.
We're not able to determine the quantization variants.
Base model
google/gemma-4-31B-it