Darwin V6: Diagnostic-Guided Evolutionary Model Merging

Community Article Published April 8, 2026

Upvote

VIDRAFT_LAB

SeaWolf-AI

FINAL-Bench

Darwin V6

Full Model Family
Model Family
Darwin-35B-A3B-Opus (Qwen 3.5 MoE)
Darwin-35B-A3B-Opus Q8 GGUF (Official Quantization)
Darwin-35B-A3B-Opus GGUF (bartowski Quantization)
Darwin-31B-Opus (Gemma 4 Dense)
Darwin-9B-Opus (Qwen 3.5 Dense)
Darwin-4B-Opus (Gemma 4 E4B)
What Darwin V6 Does
Darwin V6 vs mergekit
What the Evolutionary Algorithm Discovered
Benchmark Summary
Try It
Transformers
GGUF (Ollama)
Live Demos
Run Darwin V6 Yourself
All Links
Models
GGUF
Demos
Benchmarks
License & Credits
Full Model Family

Introducing the Darwin model family.

The Darwin V6 engine diagnoses two AI models at the tensor level, then uses evolutionary algorithms to find optimal merge ratios and combines them into a single model. Currently 6 models are publicly available across Gemma 4 and Qwen 3.5 architectures, with 8 repositories including GGUF quantized versions.

Model Family

Darwin-35B-A3B-Opus (Qwen 3.5 MoE)


Father	Qwen3.5-35B-A3B-it
Mother	Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
Architecture	35B total / 3B active (MoE)
GPQA Diamond	90.0% (loglikelihood, full 198 questions)
ARC-Challenge	85.08%
MMMLU	85.0%
vs Father	GPQA +5.8%p
Model	Darwin-35B-A3B-Opus

Darwin-35B-A3B-Opus Q8 GGUF (Official Quantization)

8-bit quantized version. Compatible with llama.cpp, Ollama, and LM Studio.

Darwin-35B-A3B-Opus-Q8-GGUF

Darwin-35B-A3B-Opus GGUF (bartowski Quantization)

Multiple quantization levels by bartowski (Q4_K_M, Q5_K_M, Q6_K, Q8_0, etc.). Community-standard quantization format.

bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF

Darwin-31B-Opus (Gemma 4 Dense)


Father	google/gemma-4-31B-it
Mother	TeichAI/gemma-4-31B-it-Claude-Opus-Distill
Architecture	Dense 31B, 256K context, 140+ languages, Vision, Thinking mode
GPQA Diamond	66.0% (generative thinking, greedy, 50Q)
Father (same condition)	60.0% — +10% relative improvement
ARC-Challenge	82.89%
Model	Darwin-31B-Opus
Demo	Live Demo

Darwin-9B-Opus (Qwen 3.5 Dense)


Father	Qwen3.5-9B
Mother	Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled
Architecture	Dense 9B
Model	Darwin-9B-Opus
Demo	Live Demo

Darwin-4B-Opus (Gemma 4 E4B)


Father	google/gemma-4-E4B-it
Mother	arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled
Architecture	Effective 4B (total 11.4B), 128K context, text + image + audio
ARC-Challenge	82.92%
Note	Can run in-browser via WebGPU after ONNX conversion
Model	Darwin-4B-Opus

Model Diagnostic Scan (MDS)

Father (gemma-4-E4B-it) MDS Scan Mother (Claude-Opus-Distill) MDS Scan

Left: Father (gemma-4-E4B-it) — balanced generalist. Right: Mother (Claude-Opus-Distill) — reasoning concentration in late layers from Claude Opus distillation.

Parent Comparison — Layer-wise Importance

What Darwin V6 Does

Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and every tensor in the model blends at the same proportion, with no distinction between which tensors matter for reasoning versus coding.

Darwin V6 diagnoses both parent models at the tensor level before merging. This process is called MDS (Model Diagnostic Scan) and consists of two stages.

First, static tensor analysis. It measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy) for every tensor.

Second, functional probing. Five diagnostic prompts (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model, measuring cosine distance when each layer is skipped. This determines each layer's functional importance.

The two results are combined to produce per-tensor optimal ratios:

combined     = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6
final_ratio  = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust)

When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. Zero noise injection. The mri_trust parameter itself is optimized by a CMA-ES evolutionary algorithm, so the optimal transplant intensity is determined automatically for each model pair.

After merging, a Health Check compares the child model against both parents layer-by-layer, detecting interference or function loss.

The base merge operations (DARE-TIES, SLERP, Linear) are implemented directly in PyTorch. mergekit is not used. The core of Darwin is not the merge algorithm itself, but the per-tensor diagnostic system and evolutionary ratio optimization built on top of it.

Darwin V6 vs mergekit

Capability	mergekit	Darwin V6
Ratio selection	Uniform ratio across all tensors	Independent ratio per tensor
Pre-merge analysis	None	Static tensor profiling + 5-probe functional analysis
Post-merge validation	Benchmark score only	Layer-by-layer Health Check (interference + function loss)
Search method	Manual tuning	CMA-ES evolutionary search, 14-dimensional adaptive genome
Transplant	Not supported	Direct transplant when ratio is extreme, zero interpolation

What the Evolutionary Algorithm Discovered

The optimal genome for Darwin-31B-Opus reveals a striking pattern.

ffn_ratio=0.93 — Mother (Claude Opus Distill) dominates FFN layers at 93%. The evolutionary algorithm independently discovered that the core of reasoning capability is stored in FFN weights.

block_5 (L50-L59)=0.86 — The final 10 layers out of 60 favor Mother at 86%. The reasoning core is concentrated in the latter half of the model.

attn_ratio=0.32 — Attention layers go the opposite direction, with Father (Gemma 4) at 68%. This preserves the original multimodal and long-context processing capabilities.

This pattern aligns precisely with the MDS heatmap showing Mother's functional distribution across layers. The evolutionary algorithm reached the same conclusion without directly seeing the MDS results.

Benchmark Summary

Model	Benchmark	Score	Father	Improvement
Darwin-35B-A3B-Opus	GPQA Diamond (loglikelihood, 198Q)	90.0%	84.2%	+5.8%p
Darwin-35B-A3B-Opus	MMMLU	85.0%	-	-
Darwin-35B-A3B-Opus	ARC-Challenge	85.08%	-	-
Darwin-31B-Opus	GPQA Diamond (generative, 50Q)	66.0%	60.0%	+10% relative
Darwin-31B-Opus	ARC-Challenge	82.89%	-	-
Darwin-4B-Opus	ARC-Challenge	82.92%	-	-

All benchmarks measured under identical conditions (same questions, same seed, same decoding settings) comparing against the Father model. Gemma 4 architecture has limited compatibility with lm-eval's loglikelihood method due to its multimodal wrapper structure; only generative evaluation produces valid results for Gemma 4 based models.

Try It

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-35B-A3B-Opus",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

GGUF (Ollama)

ollama run FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF

Live Demos

Run Darwin V6 Yourself

The Darwin V6 engine is available as a Space. If you have a compatible model pair, you can run diagnostic-guided merging yourself:

Darwin V6 Engine

All Links

Models

Model	Link
Darwin-35B-A3B-Opus	huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus
Darwin-31B-Opus	huggingface.co/FINAL-Bench/Darwin-31B-Opus
Darwin-9B-Opus	huggingface.co/FINAL-Bench/Darwin-9B-Opus
Darwin-4B-Opus	huggingface.co/FINAL-Bench/Darwin-4B-Opus

GGUF

Version	Link
Q8 Official	FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF
bartowski	bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF

Demos

Model	Link
31B Demo	spaces/FINAL-Bench/Darwin-31B-Opus
35B Demo	spaces/FINAL-Bench/Darwin-35B-A3B-Opus
9B Demo	spaces/FINAL-Bench/Darwin-9B-Opus

Benchmarks

	Link
FINAL Bench	spaces/FINAL-Bench/Leaderboard
ALL Bench	spaces/FINAL-Bench/all-bench-leaderboard

License & Credits

All Darwin models are Apache 2.0.

DARE-TIES algorithm: Yadav et al., 2023 — re-implemented, not library-dependent.

Parent models by: Google DeepMind (Gemma 4), Alibaba (Qwen 3.5), TeichAI, Jackrong, arsovskidev (Claude Opus Distill).

Darwin V6 engine and models by [VIDRAFT]

Models mentioned in this article 6

"The Child That Surpassed Both Parents Through MRI-Guided Evolutionary Merge"

March 31, 2026

Introducing WM Bench: A Benchmark for Cognitive Intelligence in World Models

March 29, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote