QWEN 3.5 2B - Abliterated

⚠️ MANDATORY RESEARCH DISCLAIMER & WARNING

THIS MODEL IS UNFILTERED. This repository contains an abliterated version of Qwen 3.5 2B where all safety-alignment and refusal mechanisms have been surgically removed for academic research.

Unfiltered Content: This model will generate content that is considered harmful, dangerous, or unethical. It does not reflect the views of the developer.
Research Use Only: This is a research artifact for studying AI safety and systemic risk. It is NOT intended for general or commercial use.
User Responsibility: The user assumes all legal and ethical responsibility for the outputs generated.

Overview

This repository hosts a modified version of the Qwen 3.5 2B model. This is an Abliterated variant, meaning the internal "refusal vectors" typically reinforced through RLHF (Reinforcement Learning from Human Feedback) have been identified and orthagonalized.

This model is a primary artifact for research conducted at the Bengaluru. It is designed to facilitate the study of Adversarial Robustness and Systemic Risk Modeling without the interference of top-level alignment filters that often censor technical or edge-case data.

Model Capabilities

Architecture: Hybrid Gated DeltaNet-Attention (24 Layers).
Context Window: Native 262,144 tokens (256k).
Thinking Mode: Supports a native <|thought|> block for multi-step reasoning.
Multimodal: Retains the native vision-text capabilities for analyzing UI, logs, and diagrams.
Agentic Power: Optimized for tool-calling via MCP (Model Context Protocol).

Research Objectives

The development of this model serves several critical academic functions:

AI Safety Benchmarking: Testing the "base" capabilities of small-parameter models when safety guardrails are removed.
Unfiltered Data Retrieval: Accessing raw technical and legal interpretations (e.g., analyzing the Digital Personal Data Protection Act or maritime law) without the model defaulting to "diplomatic" or "vague" responses.
Systemic Risk Simulation: Modeling "Black Swan" events in global logistics and infrastructure where standard "safe" models refuse to simulate high-consequence failure scenarios.

Technical Methodology

The ablation process involved:

Identifying the high-dimensional directions in the residual stream associated with "refusal" responses.
Projecting the model's weights onto the null space of these vectors to prevent the model from triggering refusal states.
Retaining the original reasoning capabilities while removing the "preachiness" or moralizing layers of the base fine-tune.

This model was modified using Projected Orthogonalization. Unlike traditional uncensored fine-tunes that can degrade reasoning, this method identifies the specific "refusal direction" $r$ in the model's residual stream and subtracts it from the output projections:

$W_{new} = W_{old} - (W_{old} \cdot r) \cdot r^T$

By removing these vectors, the model no longer triggers "As an AI language model..." refusals when encountering "sensitive" technical, legal, or systemic queries.

Usage & Implementation

This model is best utilized for research in a controlled environment (e.g., Google Colab T4). Due to the removal of safety filters, it is recommended to use specific system prompts to maintain the desired research persona.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Amarthya11/QWEN352B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype=torch.float16,
    trust_remote_code=True
)

Downloads last month: 51

Safetensors

Model size

2B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support