## Model Details

### Model Description

`This model` is a PEFT adapter for binary sequence classification built on top of the base model `JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI`. It is designed to classify source code or code-like input into one of two classes:

- `BENIGN`
- `MALWARE`

## Uses
This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be:

- `MALWARE`
- `BENIGN`

Potential direct uses include:

- automated triage of code samples
- code security analysis pipelines
- research experiments on malicious vs. benign code classification
- assisting analysts in prioritizing suspicious samples


## Citation
@article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security}, 
author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} }

## Results

![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/0abuxeE2E13ZeQDuG6LVR.png)

## Limitations

The model focusd on pattrns that is realted to windows PE files.


## How to Get Started with the Model

Use the code below to load the tokenizer, base model, and PEFT adapter for inference:

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI"
ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# tokenizer (always from base model)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# base model
base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL)

# load adapter from Hugging Face
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)

model.to(device)
model.eval()

# input
input_single = """#include <iostream>
#include <string>
#include <cstdint>
... your code ...
"""

inputs = tokenizer(
    input_single,
    truncation=True,
    padding=True,
    return_tensors="pt"
).to(device)

# inference
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    pred = outputs.logits.argmax(dim=1).item()

label = "MALWARE" if pred == 1 else "BENIGN"

print("Prediction:", label)