JeloH
/

xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter

Model card Files Files and versions

xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter / README.md

JeloH's picture

Update README.md

32b891e verified about 1 month ago

|

history blame contribute delete

2.59 kB

	## Model Details

	### Model Description

	`This model` is a PEFT adapter for binary sequence classification built on top of the base model `JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI`. It is designed to classify source code or code-like input into one of two classes:

	- `BENIGN`
	- `MALWARE`

	## Uses
	This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be:

	- `MALWARE`
	- `BENIGN`

	Potential direct uses include:

	- automated triage of code samples
	- code security analysis pipelines
	- research experiments on malicious vs. benign code classification
	- assisting analysts in prioritizing suspicious samples



	## Citation
	@article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security},
	author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} }

	## Results

	![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/0abuxeE2E13ZeQDuG6LVR.png)

	## Limitations

	The model focusd on pattrns that is realted to windows PE files.


	## How to Get Started with the Model

	Use the code below to load the tokenizer, base model, and PEFT adapter for inference:

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	from peft import PeftModel

	BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI"
	ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter"

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# tokenizer (always from base model)
	tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	# base model
	base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL)

	# load adapter from Hugging Face
	model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)

	model.to(device)
	model.eval()

	# input
	input_single = """#include <iostream>
	#include <string>
	#include <cstdint>
	... your code ...
	"""

	inputs = tokenizer(
	input_single,
	truncation=True,
	padding=True,
	return_tensors="pt"
	).to(device)

	# inference
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=1)
	pred = outputs.logits.argmax(dim=1).item()

	label = "MALWARE" if pred == 1 else "BENIGN"

	print("Prediction:", label)