LFM 2.5 1.2B Instruct - Core ML (ANE)
This is an experimental Core ML export of LiquidAI/LFM2.5-1.2B-Instruct, specifically optimized and structured for the Apple Neural Engine (ANE) using Core ML 7's Stateful API.
Model Details
- Architecture: Liquid Foundation Model (LFM) - LIV Convolution + Full Attention Hybrid
- Size: 1.2B Parameters
- Quantization: 4-bit Linear Symmetric (INT4 weights)
- Target Runtime: Core ML / Apple Neural Engine (iOS 18+ / macOS 15+)
- Cache Handling: Native
MLState(Stateful Core ML) with fixed sequence length bounds.
Integration & Export Details
This model has been adapted from its original PyTorch format because the native LIV Convolution state management dynamically concats cache tensors over time, an operation that is incompatible with the ANE's static memory requirements.
To solve this, the export pipeline applied the following transformations:
- Static Buffer Allocation: The rolling
conv_cacheand standard attentionkey_valuecaches are allocated to fixed bounds (e.g.MAX_SEQ_LEN = 512) at initialization. - In-Place Updates: Dynamic slice concatenation was monkey-patched to use in-place slice assignment (
tensor[:] = ...andtensor[:, :, cache_position, :] = ...). - Core ML 7 State Mapping: These buffers are registered as
ct.StateTypeinputs/outputs duringcoremltoolsconversion so the Swift runtime can handle them efficiently asMLStateopaque handles. - INT4 Quantization: The linear layers have been quantized to 4-bit to fit within strict iOS Jetsam limits on 8GB devices.
Usage in Swift
This model must be invoked using MLState instead of passing the caches explicitly:
import CoreML
let config = MLModelConfiguration()
config.computeUnits = .cpuAndGPU // or .all, though ANE compile success may vary by iOS patch
let model = try await LFM2_5_1_2B_Stateful(configuration: config)
let state = model.makeState()
// Token generation loop
let input = LFM2_5_1_2B_StatefulInput(
input_ids: currentTokenArray,
cache_position: cachePositionArray,
attention_mask: attentionMaskArray
)
let output = try await model.prediction(input: input, using: state)
Intended Use
This repository was compiled for use inside iMLX (an experimental local inference chat app for iOS). It includes the original Hugging Face tokenizer.json and a specific model_config.json designed for the app's ModelDownloadService.
- Downloads last month
- 12