Qwen3.5-35B-A3B Text-Only

Text-only weights extracted from Qwen/Qwen3.5-35B-A3B (VLM, Mixture-of-Experts) for use with vLLM's Qwen3_5MoeForCausalLM architecture.

What this is

Qwen3.5 MoE models are natively multimodal (VLM). Their HuggingFace checkpoints use Qwen3_5MoeForConditionalGeneration with weights prefixed as model.language_model.*. This repo provides the language model backbone only, with:

architectures: ["Qwen3_5MoeForCausalLM"]
model_type: "qwen3_5_moe_text"
Weight keys at model.layers.* (standard causal LM format, no language_model. prefix)
Vision encoder and MTP weights removed

Model structure

Architecture: Hybrid GatedDeltaNet + Full Attention, Sparse Mixture-of-Experts
Total parameters: ~35B (3B active per token)
Dtype: bfloat16

How to use with vLLM

from vllm import LLM
llm = LLM(model="codecho/Qwen3.5-35B-A3B-text-only", trust_remote_code=True, tensor_parallel_size=2)

Downloads last month: 11

Safetensors

Model size

34B params

Tensor type

BF16

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codecho/Qwen3.5-35B-A3B-text-only

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

(88)

this model