Qwen3.5-27B-abliterated-v2-MAX-FP8

Qwen3.5-27B-abliterated-v2-MAX-FP8 is an FP8-compressed evolution built on top of prithivMLmods/Qwen3.5-27B-abliterated-v2-MAX. This variant leverages BF16 · FP8 (F8_E4M3) precision formats to significantly reduce memory footprint and improve inference efficiency. This version maintains the original character of the model, while introducing a more optimized abliteration rate, combining refined refusal direction analysis with enhanced training strategies to further minimize internal refusal behaviors while preserving strong reasoning and instruction-following capabilities. The result is a powerful 27B parameter language model optimized for highly detailed responses and superior instruction adherence.

This model is intended strictly for research and learning purposes. Due to reduced internal refusal mechanisms, it may generate sensitive or unrestricted content. Users assume full responsibility for how the model is used. The authors and hosting platform disclaim any liability for generated outputs.

Key Highlights

FP8 Compression (F8_E4M3): Significantly reduces VRAM usage and improves inference throughput while retaining strong model quality.
BF16 · FP8 Hybrid Precision: Balances numerical stability and performance across different layers of the model.
Optimized Abliteration Rate (v2): Enhanced suppression of refusal directions with improved balance between openness, coherence, and stability.
Advanced Refusal Direction Analysis: Uses targeted activation analysis to identify and mitigate refusal directions within the model’s latent space.
Abliterated v2 Training Strategy: Further reduces refusal behaviors while maintaining response quality and consistency.
27B Parameter Architecture: Built on Qwen3.5-27B, delivering strong reasoning and knowledge capacity.
Improved Instruction Adherence: Better handling of complex, multi-step, and nuanced prompts with minimal unnecessary refusals.
Efficient High-Capability Deployment: Enables running large models with reduced hardware requirements compared to full-precision variants.

Quick Start with Transformers

pip install transformers==5.4.0
# or
pip install git+https://github.com/huggingface/transformers.git

from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor
import torch

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "prithivMLmods/Qwen3.5-27B-abliterated-v2-MAX-FP8",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/Qwen3.5-27B-abliterated-v2-MAX-FP8"
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Explain how transformer models work in simple terms."}
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = processor(
    text=[text],
    padding=True,
    return_tensors="pt"
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

Alignment & Refusal Research: Studying aggressive abliteration and reduced refusal mechanisms under compressed precision.
Red-Teaming Experiments: Evaluating robustness across adversarial or edge-case prompts.
Efficient Large Model Deployment: Running 27B-class models with reduced VRAM requirements using FP8.
Research Prototyping: Exploring trade-offs between compression, alignment, and reasoning quality.

Limitations & Risks

Important Note: This model intentionally minimizes built-in safety refusals.

High Risk of Sensitive Outputs: May generate unrestricted, controversial, or explicit responses.
User Responsibility: Must be used in a safe, ethical, and lawful manner.
Precision Trade-offs: FP8 compression may introduce minor degradation in numerical stability or edge-case reasoning.
Compute Requirements: Still requires capable GPUs, though significantly reduced compared to full-precision 27B models.
Abliteration Trade-offs: Increased openness may sometimes affect safety alignment or consistency.