AI & ML interests

None defined yet.

Recent Activity

ur-dad-matt  updated a collection 2 days ago
Outlier Universal (GGUF)
ur-dad-matt  updated a collection 2 days ago
Outlier Universal (GGUF)
ur-dad-matt  updated a collection 2 days ago
Outlier Universal (GGUF)
View all activity

Organization Card

Outlier

Local AI for Mac, plus a ternary Mixture-of-Experts research track.

Outlier is two things in one org: a Mac-native desktop app that runs the best open-weights models offline, and a research effort training our own ternary MoE language models as overlay deltas on a frozen Qwen2.5 base. Both are Apache 2.0. Both ship here.

  • App: one 8.8 MB DMG, five curated shipping tiers, no tokens, no cloud, no account. Free forever for v1.
  • Research: four MoE scales (10B / 40B / 70B / 150B) built as {-1, 0, +1} overlays on a frozen Qwen base, plus the alpha-fix recovery primitive — 280 per-expert scalar gates in a 15 KB overlay that recovered +1.61pp MMLU on 70B where a 68M-parameter LoRA regressed.

Built solo in 19 days on a Mac Studio M1 Ultra plus spot B200 GPUs. Total compute spend under $1,200. Three U.S. provisional patents filed.

Website: outlier.host · Engine: github.com/Outlier-host/outlier · Contact: matt@outlier.host


What you can run today

The desktop app ships five curated tiers. Every tier is a Mac-optimized build of an open-weights base model, bundled in the one 8.8 MB installer.

Tier Base Quant RAM Speed (M1 Ultra) Use case
Nano Qwen3 1.7B MLX 4-bit < 2 GB bench pending Fast drafts, low-battery
Lite Qwen 2.5 7B MLX 4-bit AWQ 4.47 GB 71.30 tok/s Daily driver, chat, writing
Compact Qwen 2.5 14B MLX 4-bit AWQ 8.24 GB 37.26 tok/s Reasoning, deeper context
Max Qwen 2.5 32B GGUF Q4 ~18 GB bench pending Long-form, complex tasks
Code Qwen3-Coder-30B-A3B MLX 4-bit ~16 GB 55 tok/s Agentic coding, repo-scale

Lite / Compact speeds: [VERIFIED] — Mac Studio M1 Ultra 64 GB, mlx_lm, 5-prompt steady-state, 3-prompt warmup, temp 0.7, April 17, 2026. Source: bench_7b.json / bench_14b.json.

Code tier (Outlier-Coder 30B): Our Mac-optimized build of Qwen3-Coder-30B-A3B-Instruct by the Alibaba Qwen team (Apache 2.0). We quantize to MLX 4-bit (16 GB on disk vs. 61 GB FP16), tune sampling defaults for Apple Silicon (top_p=0.8, top_k=20, rep_penalty=1.05), and bundle it into the installer. The base model is the Qwen team's work; credit there.

Download the app: outlier.host · Apple Silicon · macOS 13+ · 8 GB RAM minimum.


Quickstart — MLX (Mac)

# pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Outlier-Ai/Outlier-Lite-7B-MLX-4bit")
prompt = "Explain mixture of experts in one paragraph."
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Quickstart — transformers (GPU / CUDA)

# pip install transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Outlier-Ai/Outlier-70B-V3.3"  # research-track MoE overlay
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto", trust_remote_code=True,
)

messages = [{"role": "user", "content": "Write a quicksort in Rust."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Research track — ternary MoE overlays

Our own MoE family is trained as overlay deltas on a frozen Qwen2.5 base. Routed experts are stored in {-1, 0, +1} ternary at ~1.6 bits per weight. A frozen full-precision Qwen base acts as the shared expert. Top-2 routing per MoE layer. This means our repos are not standalone checkpoints — they attach to an unmodified base model at load time.

MMLU (primary)

Every number has provenance. Sample size n = 14,042 (full MMLU). Harness: lm-evaluation-harness. 5-shot, bfloat16.

Model MMLU Stderr Harness Status
Outlier-150B V3.2 84.46% 0.29% v0.4.9.1 [VERIFIED] Day 13
Outlier-70B V3.3 (alpha-fixed) 83.10% 0.30% v0.4.9.1 [VERIFIED] Day 13
Outlier-40B V3.3 77.80% 0.33% v0.4.11 [VERIFIED] Day 12
Outlier-10B V3.3 70.87% ≈0.38%* v0.4.9.1 [VERIFIED] Day 13

*10B stderr calculated from binomial formula √(p(1-p)/n); matches pattern of reported values at other scales.

Harness caveat: v0.4.9.1 → v0.4.11 produced a 1.30pp delta on the same 150B weights. We've locked v0.4.9.1 as our reference harness and document both numbers in our ground-truth file so reviewers can reproduce either.

Secondary benchmarks (V3.3, verified)

Model HellaSwag ARC-C ARC-E Winogrande TruthfulQA
150B 77.00% 68.50% 90.00% 85.50% 69.19%
70B 85.95% 73.46% 91.62% 81.29% 67.12%
40B 84.64% 73.12% 91.29% 80.98% 67.49%
10B 78.30% 62.88% 85.98% 73.80% 62.11%

All [VERIFIED] at n = 14,042, v0.4.9.1.

MMLU vs. base Qwen (radical honesty)

Scale Outlier V3.3 Base Qwen FP16 Delta
10B 70.87% Qwen 7B: 74.2% −3.33pp
40B 77.80% Qwen 14B: 79.7% −1.88pp
70B 83.10% Qwen 32B: 83.3% −0.20pp (tied)
150B 84.46% Qwen 72B: 86.1% −1.64pp

Our MoE overlays trail base Qwen by 0.2–3.3pp on raw MMLU. That's the honest number. The defensible story is MMLU per GB of RAM at the slot our 70B occupies (≈20 GB / 83% MMLU), not raw MMLU.

For additional reference: Llama 3.1 70B full-sample MMLU is around 83.1%. Outlier-70B V3.3 alpha-fixed lands in that neighborhood, on a model family trained solo for under $1,200 of total compute.

Model naming — V3.3 convention

The old 10B / 40B / 70B / 150B labels counted routed-expert parameters and understated real model sizes. V3.3 moves to the industry-standard TotalB-AyyB convention (DeepSeek, Mixtral, Llama 4):

Old name V3.3 name Total params Active params
Outlier-10B Outlier-13B-A7B 13B 7B
Outlier-40B Outlier-30B-A14B 30B 14B
Outlier-70B Outlier-68B-A32B 68B 32B
Outlier-150B Outlier-150B-A70B 150B 70B

V3.2 repos remain available as [SUPERSEDED], pointing to V3.3.


Engine

  • Open-source at github.com/Outlier-host/outlier (Apache 2.0). Ternary MoE loader, three-tier paged cache, MPS + CPU backends, lm-eval compatible, alpha-overlay loader for post-training recovery.
  • GPU-resident expert dequantization. A patched modeling file materializes ternary experts to bf16 at load time — ~56× speedup over the original CPU→GPU path on a single B200.
  • Alpha-fix technique. 280 per-expert scalar gates trained in 18 minutes on one B200 recovered +1.61pp MMLU on 70B (81.49% → 83.10%). Overlay file is 15 KB — roughly 250,000× fewer trainable parameters than the LoRA approach it outperformed.
  • Desktop app (v1.3.0). Tauri + FastAPI + mlx_lm. 8.8 MB DMG, Apple Silicon, macOS 13+. SHA-256: 1837df5739eda279a564a2ef8fc33a366d9e018900e98f391bfbbf6b9408448b. Ad-hoc signed (Apple Dev ID pending).

What we're not claiming

  • We do not match Kimi K2.5, GLM-5, Claude Opus 4.6, Gemini 3 Pro, or GPT-5 on pure MMLU.
  • We are not the first ternary MoE. Microsoft + Apple's MoTE (arXiv:2506.14435, June 2025) published a shared-FP + ternary-expert architecture for vision-language models. Our contribution is the text-LLM variant, the overlay-on-frozen-base deployment artifact, and the alpha-fix recovery primitive.
  • We are not shipping models trained on trillions of tokens. Our distillation pipeline uses DeepSeek V3 as teacher and touches a fraction of the tokens a Llama-class pretrain does. The comparison we care about is quality per dollar of training, not parameter count or token count.
  • Production-ready local inference lives in the shipping tiers (curated Qwen), not yet in the MoE research repos. Our 70B V3.3 runs on a Mac Studio via the engine, but the app ships Qwen tiers for now. MoE graduates to the app when a scale validates against a real Mac RAM tier.

Status — Day 19 (April 19, 2026)

  • v1.3.0 desktop app: built and shipping (ad-hoc signed, Apple Dev ID in queue)
  • V3.3 MoE models: four scales verified, uploaded
  • Shipping tiers: Nano / Lite / Compact / Max / Code — AWQ builds live
  • Outlier-Coder 30B: Mac build of Qwen3-Coder-30B-A3B shipped in app
  • Website: outlier.host live
  • Downloads: ~5,881 across 15 repos (CLAIM — live re-verify pending)
  • Patents: 3 U.S. provisional filed (61 claims), non-provisional deadline April 3–9, 2027

Links


License & attribution

  • Engine & app: Apache 2.0
  • MoE overlays: Apache 2.0, overlay on frozen Qwen2.5 bases by the Alibaba Qwen team (Apache 2.0)
  • Outlier-Coder 30B: Mac-optimized build of Qwen3-Coder-30B-A3B-Instruct by the Alibaba Qwen team, licensed Apache 2.0. Base weights are the Qwen team's work; modifications are ours.
  • Shipping tiers (Nano / Lite / Compact / Max): Mac-optimized builds of Qwen 2.5 / Qwen 3 bases, all Apache 2.0.

Patents — defensive, not enforced against open-source implementations:

  • #64/026,886 (April 3, 2026) — Ternary-Quantized MoE Language Model System
  • #64/030,368 (April 6, 2026) — Systems and Methods for Ternary MoE Inference, Training, and Modular Deployment
  • #64/034,028 (April 9, 2026) — Zero-Delta Expert Initialization, Residual Error Correction, and Adaptive Inference

Citation

@misc{kerr2026outlier,
  title        = {Outlier: Ternary Mixture-of-Experts Language Models on Consumer Hardware},
  author       = {Kerr, Matt},
  year         = {2026},
  howpublished = {\url{https://outlier.host}},
  note         = {Outlier-Ai org, Hugging Face}
}

Changelog

  • April 19, 2026 (Day 19): Platform-first thesis locked. Org card rewritten to reflect dual-track (Platform + MoE research). Five shipping tiers canonical: Nano / Lite / Compact / Max / Code. Outlier-Coder 30B (Mac build of Qwen3-Coder-30B-A3B) added as new Code-tier flagship. Changelog catches up through Day 19.
  • April 18, 2026 (Day 18): v1.3.0 desktop DMG built (8.8 MB, Tauri, ad-hoc signed). Free-forever-v1 pricing decision made — subscription deferred until 1,000+ free downloads + 30-day retention data. Strategic pivot from "best model company" to "Apple of local AI" platform.
  • April 17, 2026 (Day 17): Mac AWQ shipping tiers verified. Lite (7B) hits 71.30 tok/s / 4.47 GB peak RSS; Compact (14B) hits 37.26 tok/s / 8.24 GB. Canonical Mac numbers. Exp 5 alpha validation loader fixed.
  • April 16, 2026 (Day 16): 10B V3.3 secondary benchmarks verified. Real backend wired end-to-end through web UI. Website deployed at outlier.host.
  • April 13, 2026 (Day 13): Five cluster wins. 70B V3.3 alpha-fixed at 83.10% MMLU (+1.61pp). 150B V3.2 re-measured at 84.46% MMLU on v0.4.9.1 (supersedes the earlier 83.16% value). YaRN 4x validated for 128K context on 70B / 150B. V4 HESTIA+LoRA killed after −1.17pp / −1.34pp regressions; fully recovered via 280-scalar alpha-fix on a 15 KB overlay.
  • April 11, 2026 (Day 11): Removed an unverified four-row MMLU table that relied on a decommissioned cluster's unsaved source files. Forensic cleanup led directly to the Day 9 provenance rules that now govern every number on this card.
  • April 9, 2026 (Day 9): Third provisional patent filed (#64/034,028). Provenance discipline rules 66–78 added to project canon after trust-laundering incident.
  • April 6, 2026 (Day 6): Second provisional patent filed (#64/030,368).
  • April 3, 2026 (Day 3): First provisional patent filed (#64/026,886). Outlier project formally begins.

datasets 0

None public yet