Darwin V6: Diagnostic-Guided Evolutionary Model Merging

Community Article Published April 8, 2026

Darwin V6

Full Model Family

Introducing the Darwin model family.

The Darwin V6 engine diagnoses two AI models at the tensor level, then uses evolutionary algorithms to find optimal merge ratios and combines them into a single model. Currently 6 models are publicly available across Gemma 4 and Qwen 3.5 architectures, with 8 repositories including GGUF quantized versions.


Model Family

Darwin-35B-A3B-Opus (Qwen 3.5 MoE)

Father Qwen3.5-35B-A3B-it
Mother Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
Architecture 35B total / 3B active (MoE)
GPQA Diamond 90.0% (loglikelihood, full 198 questions)
ARC-Challenge 85.08%
MMMLU 85.0%
vs Father GPQA +5.8%p
Model Darwin-35B-A3B-Opus

Darwin-35B-A3B-Opus Q8 GGUF (Official Quantization)

8-bit quantized version. Compatible with llama.cpp, Ollama, and LM Studio.

Darwin-35B-A3B-Opus-Q8-GGUF

Darwin-35B-A3B-Opus GGUF (bartowski Quantization)

Multiple quantization levels by bartowski (Q4_K_M, Q5_K_M, Q6_K, Q8_0, etc.). Community-standard quantization format.

bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF

Darwin-31B-Opus (Gemma 4 Dense)

Father google/gemma-4-31B-it
Mother TeichAI/gemma-4-31B-it-Claude-Opus-Distill
Architecture Dense 31B, 256K context, 140+ languages, Vision, Thinking mode
GPQA Diamond 66.0% (generative thinking, greedy, 50Q)
Father (same condition) 60.0% — +10% relative improvement
ARC-Challenge 82.89%
Model Darwin-31B-Opus
Demo Live Demo

Darwin-9B-Opus (Qwen 3.5 Dense)

Father Qwen3.5-9B
Mother Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled
Architecture Dense 9B
Model Darwin-9B-Opus
Demo Live Demo

Darwin-4B-Opus (Gemma 4 E4B)

Father google/gemma-4-E4B-it
Mother arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled
Architecture Effective 4B (total 11.4B), 128K context, text + image + audio
ARC-Challenge 82.92%
Note Can run in-browser via WebGPU after ONNX conversion
Model Darwin-4B-Opus

Model Diagnostic Scan (MDS)

Father (gemma-4-E4B-it) MDS Scan Mother (Claude-Opus-Distill) MDS Scan

Left: Father (gemma-4-E4B-it) — balanced generalist. Right: Mother (Claude-Opus-Distill) — reasoning concentration in late layers from Claude Opus distillation.

Parent Comparison — Layer-wise Importance


What Darwin V6 Does

Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and every tensor in the model blends at the same proportion, with no distinction between which tensors matter for reasoning versus coding.

Darwin V6 diagnoses both parent models at the tensor level before merging. This process is called MDS (Model Diagnostic Scan) and consists of two stages.

First, static tensor analysis. It measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy) for every tensor.

Second, functional probing. Five diagnostic prompts (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model, measuring cosine distance when each layer is skipped. This determines each layer's functional importance.

The two results are combined to produce per-tensor optimal ratios:

combined     = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6
final_ratio  = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust)

When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. Zero noise injection. The mri_trust parameter itself is optimized by a CMA-ES evolutionary algorithm, so the optimal transplant intensity is determined automatically for each model pair.

After merging, a Health Check compares the child model against both parents layer-by-layer, detecting interference or function loss.

The base merge operations (DARE-TIES, SLERP, Linear) are implemented directly in PyTorch. mergekit is not used. The core of Darwin is not the merge algorithm itself, but the per-tensor diagnostic system and evolutionary ratio optimization built on top of it.


Darwin V6 vs mergekit

Capability mergekit Darwin V6
Ratio selection Uniform ratio across all tensors Independent ratio per tensor
Pre-merge analysis None Static tensor profiling + 5-probe functional analysis
Post-merge validation Benchmark score only Layer-by-layer Health Check (interference + function loss)
Search method Manual tuning CMA-ES evolutionary search, 14-dimensional adaptive genome
Transplant Not supported Direct transplant when ratio is extreme, zero interpolation

What the Evolutionary Algorithm Discovered

The optimal genome for Darwin-31B-Opus reveals a striking pattern.

ffn_ratio=0.93 — Mother (Claude Opus Distill) dominates FFN layers at 93%. The evolutionary algorithm independently discovered that the core of reasoning capability is stored in FFN weights.

block_5 (L50-L59)=0.86 — The final 10 layers out of 60 favor Mother at 86%. The reasoning core is concentrated in the latter half of the model.

attn_ratio=0.32 — Attention layers go the opposite direction, with Father (Gemma 4) at 68%. This preserves the original multimodal and long-context processing capabilities.

This pattern aligns precisely with the MDS heatmap showing Mother's functional distribution across layers. The evolutionary algorithm reached the same conclusion without directly seeing the MDS results.


Benchmark Summary

Model Benchmark Score Father Improvement
Darwin-35B-A3B-Opus GPQA Diamond (loglikelihood, 198Q) 90.0% 84.2% +5.8%p
Darwin-35B-A3B-Opus MMMLU 85.0% - -
Darwin-35B-A3B-Opus ARC-Challenge 85.08% - -
Darwin-31B-Opus GPQA Diamond (generative, 50Q) 66.0% 60.0% +10% relative
Darwin-31B-Opus ARC-Challenge 82.89% - -
Darwin-4B-Opus ARC-Challenge 82.92% - -

All benchmarks measured under identical conditions (same questions, same seed, same decoding settings) comparing against the Father model. Gemma 4 architecture has limited compatibility with lm-eval's loglikelihood method due to its multimodal wrapper structure; only generative evaluation produces valid results for Gemma 4 based models.


Try It

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-35B-A3B-Opus",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

GGUF (Ollama)

ollama run FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF

Live Demos

Run Darwin V6 Yourself

The Darwin V6 engine is available as a Space. If you have a compatible model pair, you can run diagnostic-guided merging yourself:

Darwin V6 Engine


All Links

Models

GGUF

Demos

Benchmarks


License & Credits

All Darwin models are Apache 2.0.

DARE-TIES algorithm: Yadav et al., 2023 — re-implemented, not library-dependent.

Parent models by: Google DeepMind (Gemma 4), Alibaba (Qwen 3.5), TeichAI, Jackrong, arsovskidev (Claude Opus Distill).

Darwin V6 engine and models by [VIDRAFT]

Community

Sign up or log in to comment