Qwen3.5-35B-A3B Text-Only
Text-only weights extracted from Qwen/Qwen3.5-35B-A3B (VLM, Mixture-of-Experts) for use with vLLM's Qwen3_5MoeForCausalLM architecture.
What this is
Qwen3.5 MoE models are natively multimodal (VLM). Their HuggingFace checkpoints use Qwen3_5MoeForConditionalGeneration with weights prefixed as model.language_model.*. This repo provides the language model backbone only, with:
architectures: ["Qwen3_5MoeForCausalLM"]model_type: "qwen3_5_moe_text"- Weight keys at
model.layers.*(standard causal LM format, nolanguage_model.prefix) - Vision encoder and MTP weights removed
Model structure
- Architecture: Hybrid GatedDeltaNet + Full Attention, Sparse Mixture-of-Experts
- Total parameters: ~35B (3B active per token)
- Dtype: bfloat16
How to use with vLLM
from vllm import LLM
llm = LLM(model="codecho/Qwen3.5-35B-A3B-text-only", trust_remote_code=True, tensor_parallel_size=2)
- Downloads last month
- 11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support