## Model Details ### Model Description `This model` is a PEFT adapter for binary sequence classification built on top of the base model `JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI`. It is designed to classify source code or code-like input into one of two classes: - `BENIGN` - `MALWARE` ## Uses This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be: - `MALWARE` - `BENIGN` Potential direct uses include: - automated triage of code samples - code security analysis pipelines - research experiments on malicious vs. benign code classification - assisting analysts in prioritizing suspicious samples ## Citation @article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security}, author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} } ## Results ![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/0abuxeE2E13ZeQDuG6LVR.png) ## Limitations The model focusd on pattrns that is realted to windows PE files. ## How to Get Started with the Model Use the code below to load the tokenizer, base model, and PEFT adapter for inference: ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification from peft import PeftModel BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI" ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # tokenizer (always from base model) tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token # base model base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL) # load adapter from Hugging Face model = PeftModel.from_pretrained(base_model, ADAPTER_PATH) model.to(device) model.eval() # input input_single = """#include #include #include ... your code ... """ inputs = tokenizer( input_single, truncation=True, padding=True, return_tensors="pt" ).to(device) # inference with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=1) pred = outputs.logits.argmax(dim=1).item() label = "MALWARE" if pred == 1 else "BENIGN" print("Prediction:", label)