Qwen3.5-27B-abliterated-v2-MAX-NVFP4

Qwen3.5-27B-abliterated-v2-MAX-NVFP4 is an NVFP4-compressed evolution built on top of prithivMLmods/Qwen3.5-27B-abliterated-v2-MAX. This variant leverages F32 · BF16 · F8_E4M3 · U8 precision formats to significantly reduce memory footprint and improve inference efficiency while maintaining strong output quality. This version preserves the original model’s character and introduces a more optimized abliteration rate, combining refined refusal direction analysis with enhanced training strategies to further minimize internal refusal behaviors while retaining strong reasoning and instruction-following capabilities. The result is a powerful 27B parameter language model optimized for highly detailed responses and superior instruction adherence, now with improved deployment efficiency.

This model is intended strictly for research and learning purposes. Due to reduced internal refusal mechanisms, it may generate sensitive or unrestricted content. Users assume full responsibility for how the model is used. The authors and hosting platform disclaim any liability for generated outputs.

Key Highlights

NVFP4 Compression: Utilizes mixed precision (F32 · BF16 · F8_E4M3 · U8) to reduce VRAM usage and accelerate inference.
Optimized Abliteration Rate (v2): Enhanced suppression of refusal directions with improved balance between openness, coherence, and stability.
Advanced Refusal Direction Analysis: Identifies and mitigates refusal-related activations within the model’s latent space.
Abliterated v2 Training Strategy: Further reduces refusal behaviors while maintaining response quality and consistency.
27B Parameter Architecture: Based on Qwen3.5-27B, delivering strong reasoning and knowledge capacity.
Efficient High-Capability Deployment: Designed for high-performance inference with reduced hardware requirements compared to full-precision models.

Quick Start with Transformers

pip install transformers==5.4.0
# or
pip install git+https://github.com/huggingface/transformers.git

from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor
import torch

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "prithivMLmods/Qwen3.5-27B-abliterated-v2-MAX-NVFP4",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/Qwen3.5-27B-abliterated-v2-MAX-NVFP4"
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Explain how transformer models work in simple terms."}
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = processor(
    text=[text],
    padding=True,
    return_tensors="pt"
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

Alignment & Refusal Research: Studying the effects of abliteration and reduced refusal mechanisms under compressed inference.
Red-Teaming Experiments: Evaluating robustness under adversarial or edge-case prompts.
Efficient Large Model Deployment: Running 27B-class models with reduced VRAM requirements.
Research Prototyping: Experimentation with compression-aware transformer behavior.

Limitations & Risks

Important Note: This model intentionally minimizes built-in safety refusals.

High Risk of Sensitive Outputs: May generate unrestricted or controversial responses.
User Responsibility: Must be used in a safe, ethical, and lawful manner.
Compression Trade-offs: NVFP4 may introduce minor degradation in precision or consistency in some edge cases.
Compute Considerations: While reduced, this model still benefits from high-performance GPUs for optimal throughput.