Beinsezii
/

gemma-4-31B-it-GGUF-5.05BPW

Model card Files Files and versions

This is currently a STATIC quant, because the imatrix tool seems to be broken with Gemma 4 (>100 ppl). I will update with an imatrix once I can verify correctness.

5.05 bpw, a mixture of Q5_K and Q4_K

This is a vRAM hog that barely fits ~32k CTX on a 24GiB GPU. I'm not willing to go lower on quant and risk compromising capability, so I would instead recommend quantizing K/V or putting a couple layers in DRAM for long context agentic tasks. Otherwise I'd use https://huggingface.co/Beinsezii/gemma-4-26B-A4B-it-GGUF-6.52BPW instead.

Downloads last month: 1,271

GGUF

Model size

31B params

Architecture

gemma4

Hardware compatibility

Log In to add your hardware

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Beinsezii/gemma-4-31B-it-GGUF-5.05BPW

Base model

google/gemma-4-31B-it

Quantized

(82)

this model

Collection including Beinsezii/gemma-4-31B-it-GGUF-5.05BPW

24GB Models

Models optimized for 24GB VRAM • 7 items • Updated 4 days ago