Qwen3.5 4B GGUF (Quantized)
This repository provides a GGUF quantized version of the original Qwen3.5 4B model, optimized for efficient local inference using tools like llama.cpp, LM Studio, and similar runtimes.
π Base Model
This model is derived from the official:
π https://huggingface.co/Qwen/Qwen3.5-4B (by Alibaba / Qwen Team)
Please refer to the original model for full details, training methodology, benchmarks, and licensing terms.
βοΈ Quantization Details
- Format: GGUF
- Quantization: Q4_K_XL
- Size: ~2.9 GB
- Architecture: Qwen3.5
This version is designed to balance performance and memory efficiency, making it suitable for local deployments.
π¦ Quantization Source
This GGUF file is sourced from:
π https://huggingface.co/unsloth/Qwen3.5-4B-GGUF
Specifically:
- Qwen3.5-4B-UD-Q4_K_XL.gguf
All credit for quantization goes to the original uploader (Unsloth).
π Usage
You can run this model locally using:
llama.cpp
./main -m qwen3.5-4b-q4_k_xl.gguf -p "Explain SQL injection"
Other tools
- LM Studio
- KoboldCpp
- Ollama
π‘ Example Use Cases
- General-purpose chat
- Coding assistance
- Technical explanations
- Integration into custom AI systems (e.g., agents, tools)
π§ͺ Tested With
- Local inference (CPU/GPU hybrid)
- Integration with external tools (web search, reasoning pipelines)
β οΈ Disclaimer
- This is not an original model.
- Behavior and capabilities are inherited from the base Qwen3.5 model.
π License
- Please follow the license of the original Qwen model.
π Acknowledgements
- Qwen Team (Alibaba) β Base model
- Unsloth β GGUF quantization
- llama.cpp β GGUF runtime support
π Related Project
This model is used in:
π CyberGuard AI (Cybersecurity assistant system)
- Hosted on huggingface spaces
- Downloads last month
- 65
4-bit