128GB UMA Models
Collection
Models optimized for Strix Halo and similar systems • 4 items • Updated • 4
Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems.
The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K.
See the GLM version for more details on theory and comparisons.
We're not able to determine the quantization variants.
Base model
mistralai/Mistral-Small-4-119B-2603