Outlier
AI & ML interests
None defined yet.
Recent Activity
Outlier
Local AI for Mac, plus a ternary Mixture-of-Experts research track.
Outlier is two things in one org: a Mac-native desktop app that runs the best open-weights models offline, and a research effort training our own ternary MoE language models as overlay deltas on a frozen Qwen2.5 base. Both are Apache 2.0. Both ship here.
- App: one 8.8 MB DMG, five curated shipping tiers, no tokens, no cloud, no account. Free forever for v1.
- Research: four MoE scales (10B / 40B / 70B / 150B) built as {-1, 0, +1} overlays on a frozen Qwen base, plus the alpha-fix recovery primitive — 280 per-expert scalar gates in a 15 KB overlay that recovered +1.61pp MMLU on 70B where a 68M-parameter LoRA regressed.
Built solo in 19 days on a Mac Studio M1 Ultra plus spot B200 GPUs. Total compute spend under $1,200. Three U.S. provisional patents filed.
Website: outlier.host · Engine: github.com/Outlier-host/outlier · Contact: matt@outlier.host
What you can run today
The desktop app ships five curated tiers. Every tier is a Mac-optimized build of an open-weights base model, bundled in the one 8.8 MB installer.
| Tier | Base | Quant | RAM | Speed (M1 Ultra) | Use case |
|---|---|---|---|---|---|
| Nano | Qwen3 1.7B | MLX 4-bit | < 2 GB | bench pending | Fast drafts, low-battery |
| Lite | Qwen 2.5 7B | MLX 4-bit AWQ | 4.47 GB | 71.30 tok/s | Daily driver, chat, writing |
| Compact | Qwen 2.5 14B | MLX 4-bit AWQ | 8.24 GB | 37.26 tok/s | Reasoning, deeper context |
| Max | Qwen 2.5 32B | GGUF Q4 | ~18 GB | bench pending | Long-form, complex tasks |
| Code | Qwen3-Coder-30B-A3B | MLX 4-bit | ~16 GB | 55 tok/s | Agentic coding, repo-scale |
Lite / Compact speeds: [VERIFIED] — Mac Studio M1 Ultra 64 GB, mlx_lm, 5-prompt steady-state, 3-prompt warmup, temp 0.7, April 17, 2026. Source: bench_7b.json / bench_14b.json.
Code tier (Outlier-Coder 30B): Our Mac-optimized build of Qwen3-Coder-30B-A3B-Instruct by the Alibaba Qwen team (Apache 2.0). We quantize to MLX 4-bit (16 GB on disk vs. 61 GB FP16), tune sampling defaults for Apple Silicon (top_p=0.8, top_k=20, rep_penalty=1.05), and bundle it into the installer. The base model is the Qwen team's work; credit there.
Download the app: outlier.host · Apple Silicon · macOS 13+ · 8 GB RAM minimum.
Quickstart — MLX (Mac)
# pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Outlier-Ai/Outlier-Lite-7B-MLX-4bit")
prompt = "Explain mixture of experts in one paragraph."
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)
Quickstart — transformers (GPU / CUDA)
# pip install transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Outlier-Ai/Outlier-70B-V3.3" # research-track MoE overlay
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype="auto", device_map="auto", trust_remote_code=True,
)
messages = [{"role": "user", "content": "Write a quicksort in Rust."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Research track — ternary MoE overlays
Our own MoE family is trained as overlay deltas on a frozen Qwen2.5 base. Routed experts are stored in {-1, 0, +1} ternary at ~1.6 bits per weight. A frozen full-precision Qwen base acts as the shared expert. Top-2 routing per MoE layer. This means our repos are not standalone checkpoints — they attach to an unmodified base model at load time.
MMLU (primary)
Every number has provenance. Sample size n = 14,042 (full MMLU). Harness: lm-evaluation-harness. 5-shot, bfloat16.
| Model | MMLU | Stderr | Harness | Status |
|---|---|---|---|---|
| Outlier-150B V3.2 | 84.46% | 0.29% | v0.4.9.1 | [VERIFIED] Day 13 |
| Outlier-70B V3.3 (alpha-fixed) | 83.10% | 0.30% | v0.4.9.1 | [VERIFIED] Day 13 |
| Outlier-40B V3.3 | 77.80% | 0.33% | v0.4.11 | [VERIFIED] Day 12 |
| Outlier-10B V3.3 | 70.87% | ≈0.38%* | v0.4.9.1 | [VERIFIED] Day 13 |
*10B stderr calculated from binomial formula √(p(1-p)/n); matches pattern of reported values at other scales.
Harness caveat: v0.4.9.1 → v0.4.11 produced a 1.30pp delta on the same 150B weights. We've locked v0.4.9.1 as our reference harness and document both numbers in our ground-truth file so reviewers can reproduce either.
Secondary benchmarks (V3.3, verified)
| Model | HellaSwag | ARC-C | ARC-E | Winogrande | TruthfulQA |
|---|---|---|---|---|---|
| 150B | 77.00% | 68.50% | 90.00% | 85.50% | 69.19% |
| 70B | 85.95% | 73.46% | 91.62% | 81.29% | 67.12% |
| 40B | 84.64% | 73.12% | 91.29% | 80.98% | 67.49% |
| 10B | 78.30% | 62.88% | 85.98% | 73.80% | 62.11% |
All [VERIFIED] at n = 14,042, v0.4.9.1.
MMLU vs. base Qwen (radical honesty)
| Scale | Outlier V3.3 | Base Qwen FP16 | Delta |
|---|---|---|---|
| 10B | 70.87% | Qwen 7B: 74.2% | −3.33pp |
| 40B | 77.80% | Qwen 14B: 79.7% | −1.88pp |
| 70B | 83.10% | Qwen 32B: 83.3% | −0.20pp (tied) |
| 150B | 84.46% | Qwen 72B: 86.1% | −1.64pp |
Our MoE overlays trail base Qwen by 0.2–3.3pp on raw MMLU. That's the honest number. The defensible story is MMLU per GB of RAM at the slot our 70B occupies (≈20 GB / 83% MMLU), not raw MMLU.
For additional reference: Llama 3.1 70B full-sample MMLU is around 83.1%. Outlier-70B V3.3 alpha-fixed lands in that neighborhood, on a model family trained solo for under $1,200 of total compute.
Model naming — V3.3 convention
The old 10B / 40B / 70B / 150B labels counted routed-expert parameters and understated real model sizes. V3.3 moves to the industry-standard TotalB-AyyB convention (DeepSeek, Mixtral, Llama 4):
| Old name | V3.3 name | Total params | Active params |
|---|---|---|---|
| Outlier-10B | Outlier-13B-A7B | 13B | 7B |
| Outlier-40B | Outlier-30B-A14B | 30B | 14B |
| Outlier-70B | Outlier-68B-A32B | 68B | 32B |
| Outlier-150B | Outlier-150B-A70B | 150B | 70B |
V3.2 repos remain available as [SUPERSEDED], pointing to V3.3.
Engine
- Open-source at github.com/Outlier-host/outlier (Apache 2.0). Ternary MoE loader, three-tier paged cache, MPS + CPU backends, lm-eval compatible, alpha-overlay loader for post-training recovery.
- GPU-resident expert dequantization. A patched modeling file materializes ternary experts to bf16 at load time — ~56× speedup over the original CPU→GPU path on a single B200.
- Alpha-fix technique. 280 per-expert scalar gates trained in 18 minutes on one B200 recovered +1.61pp MMLU on 70B (81.49% → 83.10%). Overlay file is 15 KB — roughly 250,000× fewer trainable parameters than the LoRA approach it outperformed.
- Desktop app (v1.3.0). Tauri + FastAPI + mlx_lm. 8.8 MB DMG, Apple Silicon, macOS 13+. SHA-256:
1837df5739eda279a564a2ef8fc33a366d9e018900e98f391bfbbf6b9408448b. Ad-hoc signed (Apple Dev ID pending).
What we're not claiming
- We do not match Kimi K2.5, GLM-5, Claude Opus 4.6, Gemini 3 Pro, or GPT-5 on pure MMLU.
- We are not the first ternary MoE. Microsoft + Apple's MoTE (arXiv:2506.14435, June 2025) published a shared-FP + ternary-expert architecture for vision-language models. Our contribution is the text-LLM variant, the overlay-on-frozen-base deployment artifact, and the alpha-fix recovery primitive.
- We are not shipping models trained on trillions of tokens. Our distillation pipeline uses DeepSeek V3 as teacher and touches a fraction of the tokens a Llama-class pretrain does. The comparison we care about is quality per dollar of training, not parameter count or token count.
- Production-ready local inference lives in the shipping tiers (curated Qwen), not yet in the MoE research repos. Our 70B V3.3 runs on a Mac Studio via the engine, but the app ships Qwen tiers for now. MoE graduates to the app when a scale validates against a real Mac RAM tier.
Status — Day 19 (April 19, 2026)
- v1.3.0 desktop app: built and shipping (ad-hoc signed, Apple Dev ID in queue)
- V3.3 MoE models: four scales verified, uploaded
- Shipping tiers: Nano / Lite / Compact / Max / Code — AWQ builds live
- Outlier-Coder 30B: Mac build of Qwen3-Coder-30B-A3B shipped in app
- Website: outlier.host live
- Downloads: ~5,881 across 15 repos (CLAIM — live re-verify pending)
- Patents: 3 U.S. provisional filed (61 claims), non-provisional deadline April 3–9, 2027
Links
- Website: outlier.host
- Engine: github.com/Outlier-host/outlier
- Paper:
outlier_ternary_moe_2026.pdfv6 (landing with the launch) - Responsible use reports: abuse@outlier.host
- Contact: matt@outlier.host
- Built by: Matt Kerr · Kerr & Company LLC · Grand Rapids, MI
License & attribution
- Engine & app: Apache 2.0
- MoE overlays: Apache 2.0, overlay on frozen Qwen2.5 bases by the Alibaba Qwen team (Apache 2.0)
- Outlier-Coder 30B: Mac-optimized build of Qwen3-Coder-30B-A3B-Instruct by the Alibaba Qwen team, licensed Apache 2.0. Base weights are the Qwen team's work; modifications are ours.
- Shipping tiers (Nano / Lite / Compact / Max): Mac-optimized builds of Qwen 2.5 / Qwen 3 bases, all Apache 2.0.
Patents — defensive, not enforced against open-source implementations:
- #64/026,886 (April 3, 2026) — Ternary-Quantized MoE Language Model System
- #64/030,368 (April 6, 2026) — Systems and Methods for Ternary MoE Inference, Training, and Modular Deployment
- #64/034,028 (April 9, 2026) — Zero-Delta Expert Initialization, Residual Error Correction, and Adaptive Inference
Citation
@misc{kerr2026outlier,
title = {Outlier: Ternary Mixture-of-Experts Language Models on Consumer Hardware},
author = {Kerr, Matt},
year = {2026},
howpublished = {\url{https://outlier.host}},
note = {Outlier-Ai org, Hugging Face}
}
Changelog
- April 19, 2026 (Day 19): Platform-first thesis locked. Org card rewritten to reflect dual-track (Platform + MoE research). Five shipping tiers canonical: Nano / Lite / Compact / Max / Code. Outlier-Coder 30B (Mac build of Qwen3-Coder-30B-A3B) added as new Code-tier flagship. Changelog catches up through Day 19.
- April 18, 2026 (Day 18): v1.3.0 desktop DMG built (8.8 MB, Tauri, ad-hoc signed). Free-forever-v1 pricing decision made — subscription deferred until 1,000+ free downloads + 30-day retention data. Strategic pivot from "best model company" to "Apple of local AI" platform.
- April 17, 2026 (Day 17): Mac AWQ shipping tiers verified. Lite (7B) hits 71.30 tok/s / 4.47 GB peak RSS; Compact (14B) hits 37.26 tok/s / 8.24 GB. Canonical Mac numbers. Exp 5 alpha validation loader fixed.
- April 16, 2026 (Day 16): 10B V3.3 secondary benchmarks verified. Real backend wired end-to-end through web UI. Website deployed at outlier.host.
- April 13, 2026 (Day 13): Five cluster wins. 70B V3.3 alpha-fixed at 83.10% MMLU (+1.61pp). 150B V3.2 re-measured at 84.46% MMLU on v0.4.9.1 (supersedes the earlier 83.16% value). YaRN 4x validated for 128K context on 70B / 150B. V4 HESTIA+LoRA killed after −1.17pp / −1.34pp regressions; fully recovered via 280-scalar alpha-fix on a 15 KB overlay.
- April 11, 2026 (Day 11): Removed an unverified four-row MMLU table that relied on a decommissioned cluster's unsaved source files. Forensic cleanup led directly to the Day 9 provenance rules that now govern every number on this card.
- April 9, 2026 (Day 9): Third provisional patent filed (#64/034,028). Provenance discipline rules 66–78 added to project canon after trust-laundering incident.
- April 6, 2026 (Day 6): Second provisional patent filed (#64/030,368).
- April 3, 2026 (Day 3): First provisional patent filed (#64/026,886). Outlier project formally begins.