Mistral-7B Sigma Rule Generator

This model has been fine-tuned on CAPEv2 behavioral analysis reports to generate Sigma detection rules for SIEM platforms.

Model Description

  • Base Model: mistralai/Mistral-7B-Instruct-v0.3
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Training Data: 5,000 CAPEv2 behavioral reports with corresponding Sigma rules
  • Training Time: ~2.75 hours on Kaggle P100 GPU
  • Eval Loss: 0.746

Intended Use

Generate Sigma detection rules from CAPEv2 malware analysis reports for deployment in SIEM platforms like:

  • Splunk
  • Elastic Security
  • Microsoft Sentinel
  • QRadar
  • And other SIEM solutions

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "./mistral-sigma-full-model",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./mistral-sigma-full-model")

# Prepare input
system_prompt = "You are a cybersecurity expert specializing in creating Sigma detection rules from CAPEv2 behavioral analysis reports."
instruction = "Given this CAPEv2 behavioral report, produce Sigma rules that a SOC analyst could deploy in a SIEM."

# Your CAPEv2 report data
cape_report = {
    "signatures": ["process_injection", "registry_modification"],
    "processes": [...]
}

prompt = f'''<<SYS>>
{system_prompt}
<</SYS>>

{instruction}

Input Data:
{cape_report}

Response:'''

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7)
sigma_rules = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sigma_rules)

Training Details

  • Optimizer: AdamW
  • Learning Rate: 2e-4
  • Batch Size: 1 (with gradient accumulation of 8)
  • Max Sequence Length: 256 tokens
  • Epochs: 1
  • LoRA Rank: 16
  • LoRA Alpha: 32

Limitations

  • Trained on a subset of data (5,000 examples) due to time constraints
  • May require additional fine-tuning for specific malware families
  • Output should be reviewed by security analysts before deployment

License

Apache 2.0 (same as base model)

Downloads last month
1
Safetensors
Model size
7B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SHARIHEHE/mistral-sigma-fyp

Finetuned
(413)
this model