Organization Card

Outlier

Local AI for Mac, plus a ternary Mixture-of-Experts research track.

Outlier is two things in one org: a Mac-native desktop app that runs the best open-weights models offline, and a research effort training our own ternary MoE language models as overlay deltas on a frozen Qwen2.5 base. Both are Apache 2.0. Both ship here.

App: one 8.8 MB DMG, five curated shipping tiers, no tokens, no cloud, no account. Free forever for v1.
Research: four MoE scales (10B / 40B / 70B / 150B) built as {-1, 0, +1} overlays on a frozen Qwen base, plus the alpha-fix recovery primitive — 280 per-expert scalar gates in a 15 KB overlay that recovered +1.61pp MMLU on 70B where a 68M-parameter LoRA regressed.

Built solo in 19 days on a Mac Studio M1 Ultra plus spot B200 GPUs. Total compute spend under $1,200. Three U.S. provisional patents filed.

Website: outlier.host · Engine: github.com/Outlier-host/outlier · Contact: matt@outlier.host

What you can run today

The desktop app ships five curated tiers. Every tier is a Mac-optimized build of an open-weights base model, bundled in the one 8.8 MB installer.

Tier	Base	Quant	RAM	Speed (M1 Ultra)	Use case
Nano	Qwen3 1.7B	MLX 4-bit	< 2 GB	bench pending	Fast drafts, low-battery
Lite	Qwen 2.5 7B	MLX 4-bit AWQ	4.47 GB	71.30 tok/s	Daily driver, chat, writing
Compact	Qwen 2.5 14B	MLX 4-bit AWQ	8.24 GB	37.26 tok/s	Reasoning, deeper context
Max	Qwen 2.5 32B	GGUF Q4	~18 GB	bench pending	Long-form, complex tasks
Code	Qwen3-Coder-30B-A3B	MLX 4-bit	~16 GB	55 tok/s	Agentic coding, repo-scale

Lite / Compact speeds: [VERIFIED] — Mac Studio M1 Ultra 64 GB, mlx_lm, 5-prompt steady-state, 3-prompt warmup, temp 0.7, April 17, 2026. Source: bench_7b.json / bench_14b.json.

Code tier (Outlier-Coder 30B): Our Mac-optimized build of Qwen3-Coder-30B-A3B-Instruct by the Alibaba Qwen team (Apache 2.0). We quantize to MLX 4-bit (16 GB on disk vs. 61 GB FP16), tune sampling defaults for Apple Silicon (top_p=0.8, top_k=20, rep_penalty=1.05), and bundle it into the installer. The base model is the Qwen team's work; credit there.

Download the app: outlier.host · Apple Silicon · macOS 13+ · 8 GB RAM minimum.

Quickstart — MLX (Mac)

# pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Outlier-Ai/Outlier-Lite-7B-MLX-4bit")
prompt = "Explain mixture of experts in one paragraph."
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Quickstart — transformers (GPU / CUDA)

# pip install transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Outlier-Ai/Outlier-70B-V3.3"  # research-track MoE overlay
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto", trust_remote_code=True,
)

messages = [{"role": "user", "content": "Write a quicksort in Rust."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Research track — ternary MoE overlays

Our own MoE family is trained as overlay deltas on a frozen Qwen2.5 base. Routed experts are stored in {-1, 0, +1} ternary at ~1.6 bits per weight. A frozen full-precision Qwen base acts as the shared expert. Top-2 routing per MoE layer. This means our repos are not standalone checkpoints — they attach to an unmodified base model at load time.

MMLU (primary)

Every number has provenance. Sample size n = 14,042 (full MMLU). Harness: lm-evaluation-harness. 5-shot, bfloat16.

Model	MMLU	Stderr	Harness	Status
Outlier-150B V3.2	84.46%	0.29%	v0.4.9.1	`[VERIFIED]` Day 13
Outlier-70B V3.3 (alpha-fixed)	83.10%	0.30%	v0.4.9.1	`[VERIFIED]` Day 13
Outlier-40B V3.3	77.80%	0.33%	v0.4.11	`[VERIFIED]` Day 12
Outlier-10B V3.3	70.87%	≈0.38%*	v0.4.9.1	`[VERIFIED]` Day 13

*10B stderr calculated from binomial formula √(p(1-p)/n); matches pattern of reported values at other scales.

Harness caveat: v0.4.9.1 → v0.4.11 produced a 1.30pp delta on the same 150B weights. We've locked v0.4.9.1 as our reference harness and document both numbers in our ground-truth file so reviewers can reproduce either.

Secondary benchmarks (V3.3, verified)

Model	HellaSwag	ARC-C	ARC-E	Winogrande	TruthfulQA
150B	77.00%	68.50%	90.00%	85.50%	69.19%
70B	85.95%	73.46%	91.62%	81.29%	67.12%
40B	84.64%	73.12%	91.29%	80.98%	67.49%
10B	78.30%	62.88%	85.98%	73.80%	62.11%

All [VERIFIED] at n = 14,042, v0.4.9.1.

MMLU vs. base Qwen (radical honesty)

Scale	Outlier V3.3	Base Qwen FP16	Delta
10B	70.87%	Qwen 7B: 74.2%	−3.33pp
40B	77.80%	Qwen 14B: 79.7%	−1.88pp
70B	83.10%	Qwen 32B: 83.3%	−0.20pp (tied)
150B	84.46%	Qwen 72B: 86.1%	−1.64pp

Our MoE overlays trail base Qwen by 0.2–3.3pp on raw MMLU. That's the honest number. The defensible story is MMLU per GB of RAM at the slot our 70B occupies (≈20 GB / 83% MMLU), not raw MMLU.

For additional reference: Llama 3.1 70B full-sample MMLU is around 83.1%. Outlier-70B V3.3 alpha-fixed lands in that neighborhood, on a model family trained solo for under $1,200 of total compute.

Model naming — V3.3 convention

The old 10B / 40B / 70B / 150B labels counted routed-expert parameters and understated real model sizes. V3.3 moves to the industry-standard TotalB-AyyB convention (DeepSeek, Mixtral, Llama 4):

Old name	V3.3 name	Total params	Active params
Outlier-10B	Outlier-13B-A7B	13B	7B
Outlier-40B	Outlier-30B-A14B	30B	14B
Outlier-70B	Outlier-68B-A32B	68B	32B
Outlier-150B	Outlier-150B-A70B	150B	70B

V3.2 repos remain available as [SUPERSEDED], pointing to V3.3.

Engine

Open-source at github.com/Outlier-host/outlier (Apache 2.0). Ternary MoE loader, three-tier paged cache, MPS + CPU backends, lm-eval compatible, alpha-overlay loader for post-training recovery.
GPU-resident expert dequantization. A patched modeling file materializes ternary experts to bf16 at load time — ~56× speedup over the original CPU→GPU path on a single B200.
Alpha-fix technique. 280 per-expert scalar gates trained in 18 minutes on one B200 recovered +1.61pp MMLU on 70B (81.49% → 83.10%). Overlay file is 15 KB — roughly 250,000× fewer trainable parameters than the LoRA approach it outperformed.
Desktop app (v1.3.0). Tauri + FastAPI + mlx_lm. 8.8 MB DMG, Apple Silicon, macOS 13+. SHA-256: 1837df5739eda279a564a2ef8fc33a366d9e018900e98f391bfbbf6b9408448b. Ad-hoc signed (Apple Dev ID pending).

What we're not claiming

We do not match Kimi K2.5, GLM-5, Claude Opus 4.6, Gemini 3 Pro, or GPT-5 on pure MMLU.
We are not the first ternary MoE. Microsoft + Apple's MoTE (arXiv:2506.14435, June 2025) published a shared-FP + ternary-expert architecture for vision-language models. Our contribution is the text-LLM variant, the overlay-on-frozen-base deployment artifact, and the alpha-fix recovery primitive.
We are not shipping models trained on trillions of tokens. Our distillation pipeline uses DeepSeek V3 as teacher and touches a fraction of the tokens a Llama-class pretrain does. The comparison we care about is quality per dollar of training, not parameter count or token count.
Production-ready local inference lives in the shipping tiers (curated Qwen), not yet in the MoE research repos. Our 70B V3.3 runs on a Mac Studio via the engine, but the app ships Qwen tiers for now. MoE graduates to the app when a scale validates against a real Mac RAM tier.

Status — Day 19 (April 19, 2026)

v1.3.0 desktop app: built and shipping (ad-hoc signed, Apple Dev ID in queue)
V3.3 MoE models: four scales verified, uploaded
Shipping tiers: Nano / Lite / Compact / Max / Code — AWQ builds live
Outlier-Coder 30B: Mac build of Qwen3-Coder-30B-A3B shipped in app
Website: outlier.host live
Downloads: ~5,881 across 15 repos (CLAIM — live re-verify pending)
Patents: 3 U.S. provisional filed (61 claims), non-provisional deadline April 3–9, 2027

Links

Website: outlier.host
Engine: github.com/Outlier-host/outlier
Paper: outlier_ternary_moe_2026.pdf v6 (landing with the launch)
Responsible use reports: abuse@outlier.host
Contact: matt@outlier.host
Built by: Matt Kerr · Kerr & Company LLC · Grand Rapids, MI

License & attribution

Engine & app: Apache 2.0
MoE overlays: Apache 2.0, overlay on frozen Qwen2.5 bases by the Alibaba Qwen team (Apache 2.0)
Outlier-Coder 30B: Mac-optimized build of Qwen3-Coder-30B-A3B-Instruct by the Alibaba Qwen team, licensed Apache 2.0. Base weights are the Qwen team's work; modifications are ours.
Shipping tiers (Nano / Lite / Compact / Max): Mac-optimized builds of Qwen 2.5 / Qwen 3 bases, all Apache 2.0.

Patents — defensive, not enforced against open-source implementations:

#64/026,886 (April 3, 2026) — Ternary-Quantized MoE Language Model System
#64/030,368 (April 6, 2026) — Systems and Methods for Ternary MoE Inference, Training, and Modular Deployment
#64/034,028 (April 9, 2026) — Zero-Delta Expert Initialization, Residual Error Correction, and Adaptive Inference

Citation

@misc{kerr2026outlier,
  title        = {Outlier: Ternary Mixture-of-Experts Language Models on Consumer Hardware},
  author       = {Kerr, Matt},
  year         = {2026},
  howpublished = {\url{https://outlier.host}},
  note         = {Outlier-Ai org, Hugging Face}
}

Changelog

April 19, 2026 (Day 19): Platform-first thesis locked. Org card rewritten to reflect dual-track (Platform + MoE research). Five shipping tiers canonical: Nano / Lite / Compact / Max / Code. Outlier-Coder 30B (Mac build of Qwen3-Coder-30B-A3B) added as new Code-tier flagship. Changelog catches up through Day 19.
April 18, 2026 (Day 18): v1.3.0 desktop DMG built (8.8 MB, Tauri, ad-hoc signed). Free-forever-v1 pricing decision made — subscription deferred until 1,000+ free downloads + 30-day retention data. Strategic pivot from "best model company" to "Apple of local AI" platform.
April 17, 2026 (Day 17): Mac AWQ shipping tiers verified. Lite (7B) hits 71.30 tok/s / 4.47 GB peak RSS; Compact (14B) hits 37.26 tok/s / 8.24 GB. Canonical Mac numbers. Exp 5 alpha validation loader fixed.
April 16, 2026 (Day 16): 10B V3.3 secondary benchmarks verified. Real backend wired end-to-end through web UI. Website deployed at outlier.host.
April 13, 2026 (Day 13): Five cluster wins. 70B V3.3 alpha-fixed at 83.10% MMLU (+1.61pp). 150B V3.2 re-measured at 84.46% MMLU on v0.4.9.1 (supersedes the earlier 83.16% value). YaRN 4x validated for 128K context on 70B / 150B. V4 HESTIA+LoRA killed after −1.17pp / −1.34pp regressions; fully recovered via 280-scalar alpha-fix on a 15 KB overlay.
April 11, 2026 (Day 11): Removed an unverified four-row MMLU table that relied on a decommissioned cluster's unsaved source files. Forensic cleanup led directly to the Day 9 provenance rules that now govern every number on this card.
April 9, 2026 (Day 9): Third provisional patent filed (#64/034,028). Provenance discipline rules 66–78 added to project canon after trust-laundering incident.
April 6, 2026 (Day 6): Second provisional patent filed (#64/030,368).
April 3, 2026 (Day 3): First provisional patent filed (#64/026,886). Outlier project formally begins.

Collections 4

View 4 collections

models 34

datasets 0

None public yet

AI & ML interests

Recent Activity

Team members 1