JeloH's picture
Update README.md
32b891e verified
## Model Details
### Model Description
`This model` is a PEFT adapter for binary sequence classification built on top of the base model `JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI`. It is designed to classify source code or code-like input into one of two classes:
- `BENIGN`
- `MALWARE`
## Uses
This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be:
- `MALWARE`
- `BENIGN`
Potential direct uses include:
- automated triage of code samples
- code security analysis pipelines
- research experiments on malicious vs. benign code classification
- assisting analysts in prioritizing suspicious samples
## Citation
@article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security},
author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} }
## Results
![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/0abuxeE2E13ZeQDuG6LVR.png)
## Limitations
The model focusd on pattrns that is realted to windows PE files.
## How to Get Started with the Model
Use the code below to load the tokenizer, base model, and PEFT adapter for inference:
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI"
ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# tokenizer (always from base model)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# base model
base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL)
# load adapter from Hugging Face
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
model.to(device)
model.eval()
# input
input_single = """#include <iostream>
#include <string>
#include <cstdint>
... your code ...
"""
inputs = tokenizer(
input_single,
truncation=True,
padding=True,
return_tensors="pt"
).to(device)
# inference
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
pred = outputs.logits.argmax(dim=1).item()
label = "MALWARE" if pred == 1 else "BENIGN"
print("Prediction:", label)