| ## Model Details |
|
|
| ### Model Description |
|
|
| `This model` is a PEFT adapter for binary sequence classification built on top of the base model `JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI`. It is designed to classify source code or code-like input into one of two classes: |
|
|
| - `BENIGN` |
| - `MALWARE` |
|
|
| ## Uses |
| This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be: |
|
|
| - `MALWARE` |
| - `BENIGN` |
|
|
| Potential direct uses include: |
|
|
| - automated triage of code samples |
| - code security analysis pipelines |
| - research experiments on malicious vs. benign code classification |
| - assisting analysts in prioritizing suspicious samples |
|
|
|
|
|
|
| ## Citation |
| @article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security}, |
| author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} } |
|
|
| ## Results |
|
|
|  |
|
|
| ## Limitations |
|
|
| The model focusd on pattrns that is realted to windows PE files. |
|
|
|
|
| ## How to Get Started with the Model |
|
|
| Use the code below to load the tokenizer, base model, and PEFT adapter for inference: |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| from peft import PeftModel |
| |
| BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI" |
| ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter" |
| |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| |
| # tokenizer (always from base model) |
| tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) |
| if tokenizer.pad_token is None: |
| tokenizer.pad_token = tokenizer.eos_token |
| |
| # base model |
| base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL) |
| |
| # load adapter from Hugging Face |
| model = PeftModel.from_pretrained(base_model, ADAPTER_PATH) |
| |
| model.to(device) |
| model.eval() |
| |
| # input |
| input_single = """#include <iostream> |
| #include <string> |
| #include <cstdint> |
| ... your code ... |
| """ |
| |
| inputs = tokenizer( |
| input_single, |
| truncation=True, |
| padding=True, |
| return_tensors="pt" |
| ).to(device) |
| |
| # inference |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| probs = torch.softmax(outputs.logits, dim=1) |
| pred = outputs.logits.argmax(dim=1).item() |
| |
| label = "MALWARE" if pred == 1 else "BENIGN" |
| |
| print("Prediction:", label) |
| |