Gemma-4-31B-JANG_4M-CRACK-GGUF

GGUF quantizations of Gemma-4-31B-JANG_4M-CRACK for use with llama.cpp, LM Studio, Ollama, and other GGUF-compatible inference engines.

About the Model

  • Base model: google/gemma-4-31b-it
  • Architecture: Gemma 4 Dense Transformer (31B parameters, 60 layers)
  • Features: Hybrid Sliding/Global Attention, Vision + Audio multimodal
  • Modification: CRACK abliteration (refusal removal) + JANG v2 mixed-precision quantization

Why This Conversion?

The original model uses JANG v2 mixed-precision MLX quantization (attention 8-bit + MLP 4-bit), which is only compatible with vMLX. Standard tools (llama.cpp, LM Studio, oMLX, mlx-lm) cannot load this format due to mixed per-layer bit widths.

This repository provides standard GGUF quantizations that work everywhere.

Conversion Process

Original (JANG v2 MLX safetensors, ~18GB)
    ↓ dequantize (attention 8-bit → f16, MLP 4-bit → f16)
Intermediate (float16 safetensors, ~60GB)
    ↓ convert_hf_to_gguf.py + quantize
GGUF (various quantizations)

Note: Since the original was already quantized (avg 5.1 bits), the dequantized f16 intermediate is an approximation. Re-quantizing to GGUF introduces minimal additional quality loss since the attention layers were preserved at 8-bit in the original.

Available Quantizations

File Quant Size Quality Notes
gemma-4-31b-jang-crack-Q3_K_M.gguf Q3_K_M ~14 GB Acceptable Minimum viable quality
gemma-4-31b-jang-crack-Q4_K_M.gguf Q4_K_M ~18 GB Good Best size/quality balance
gemma-4-31b-jang-crack-Q5_K_M.gguf Q5_K_M ~21 GB Better Recommended if RAM allows
gemma-4-31b-jang-crack-Q6_K.gguf Q6_K ~25 GB Very Good High quality
gemma-4-31b-jang-crack-Q8_0.gguf Q8_0 ~33 GB Near lossless Closest to original

System Requirements

Quantization Minimum RAM Recommended
Q3_K_M 20 GB 24 GB
Q4_K_M 24 GB 32 GB
Q5_K_M 28 GB 36 GB
Q6_K 32 GB 40 GB
Q8_0 40 GB 48 GB

Usage

LM Studio

Download any .gguf file and open it in LM Studio.

llama.cpp

./llama-cli -m gemma-4-31b-jang-crack-Q4_K_M.gguf -p "Hello" -n 256

Ollama

echo 'FROM ./gemma-4-31b-jang-crack-Q4_K_M.gguf' > Modelfile
ollama create gemma4-crack -f Modelfile
ollama run gemma4-crack

License

Gemma License

Disclaimer

This model has had safety guardrails removed. Use responsibly and in compliance with applicable laws.

Downloads last month
335
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support