Mistral-7B Sigma Rule Generator
This model has been fine-tuned on CAPEv2 behavioral analysis reports to generate Sigma detection rules for SIEM platforms.
Model Description
- Base Model: mistralai/Mistral-7B-Instruct-v0.3
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Training Data: 5,000 CAPEv2 behavioral reports with corresponding Sigma rules
- Training Time: ~2.75 hours on Kaggle P100 GPU
- Eval Loss: 0.746
Intended Use
Generate Sigma detection rules from CAPEv2 malware analysis reports for deployment in SIEM platforms like:
- Splunk
- Elastic Security
- Microsoft Sentinel
- QRadar
- And other SIEM solutions
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"./mistral-sigma-full-model",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./mistral-sigma-full-model")
# Prepare input
system_prompt = "You are a cybersecurity expert specializing in creating Sigma detection rules from CAPEv2 behavioral analysis reports."
instruction = "Given this CAPEv2 behavioral report, produce Sigma rules that a SOC analyst could deploy in a SIEM."
# Your CAPEv2 report data
cape_report = {
"signatures": ["process_injection", "registry_modification"],
"processes": [...]
}
prompt = f'''<<SYS>>
{system_prompt}
<</SYS>>
{instruction}
Input Data:
{cape_report}
Response:'''
# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7)
sigma_rules = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sigma_rules)
Training Details
- Optimizer: AdamW
- Learning Rate: 2e-4
- Batch Size: 1 (with gradient accumulation of 8)
- Max Sequence Length: 256 tokens
- Epochs: 1
- LoRA Rank: 16
- LoRA Alpha: 32
Limitations
- Trained on a subset of data (5,000 examples) due to time constraints
- May require additional fine-tuning for specific malware families
- Output should be reviewed by security analysts before deployment
License
Apache 2.0 (same as base model)
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for SHARIHEHE/mistral-sigma-fyp
Base model
mistralai/Mistral-7B-v0.3 Finetuned
mistralai/Mistral-7B-Instruct-v0.3