Mistral-7B Sigma Rule Generator

This model has been fine-tuned on CAPEv2 behavioral analysis reports to generate Sigma detection rules for SIEM platforms.

Model Description

Base Model: mistralai/Mistral-7B-Instruct-v0.3
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Data: 5,000 CAPEv2 behavioral reports with corresponding Sigma rules
Training Time: ~2.75 hours on Kaggle P100 GPU
Eval Loss: 0.746

Intended Use

Generate Sigma detection rules from CAPEv2 malware analysis reports for deployment in SIEM platforms like:

Splunk
Elastic Security
Microsoft Sentinel
QRadar
And other SIEM solutions

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "./mistral-sigma-full-model",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./mistral-sigma-full-model")

# Prepare input
system_prompt = "You are a cybersecurity expert specializing in creating Sigma detection rules from CAPEv2 behavioral analysis reports."
instruction = "Given this CAPEv2 behavioral report, produce Sigma rules that a SOC analyst could deploy in a SIEM."

# Your CAPEv2 report data
cape_report = {
    "signatures": ["process_injection", "registry_modification"],
    "processes": [...]
}

prompt = f'''<<SYS>>
{system_prompt}
<</SYS>>

{instruction}

Input Data:
{cape_report}

Response:'''

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7)
sigma_rules = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sigma_rules)

Training Details

Optimizer: AdamW
Learning Rate: 2e-4
Batch Size: 1 (with gradient accumulation of 8)
Max Sequence Length: 256 tokens
Epochs: 1
LoRA Rank: 16
LoRA Alpha: 32

Limitations

Trained on a subset of data (5,000 examples) due to time constraints
May require additional fine-tuning for specific malware families
Output should be reviewed by security analysts before deployment

License

Apache 2.0 (same as base model)

Downloads last month: 1

Safetensors

Model size

7B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SHARIHEHE/mistral-sigma-fyp

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Finetuned

(413)

this model