LFM 2.5 1.2B Instruct - Core ML (ANE)

This is an experimental Core ML export of LiquidAI/LFM2.5-1.2B-Instruct, specifically optimized and structured for the Apple Neural Engine (ANE) using Core ML 7's Stateful API.

Model Details

Architecture: Liquid Foundation Model (LFM) - LIV Convolution + Full Attention Hybrid
Size: 1.2B Parameters
Quantization: 4-bit Linear Symmetric (INT4 weights)
Target Runtime: Core ML / Apple Neural Engine (iOS 18+ / macOS 15+)
Cache Handling: Native MLState (Stateful Core ML) with fixed sequence length bounds.

Integration & Export Details

This model has been adapted from its original PyTorch format because the native LIV Convolution state management dynamically concats cache tensors over time, an operation that is incompatible with the ANE's static memory requirements.

To solve this, the export pipeline applied the following transformations:

Static Buffer Allocation: The rolling conv_cache and standard attention key_value caches are allocated to fixed bounds (e.g. MAX_SEQ_LEN = 512) at initialization.
In-Place Updates: Dynamic slice concatenation was monkey-patched to use in-place slice assignment (tensor[:] = ... and tensor[:, :, cache_position, :] = ...).
Core ML 7 State Mapping: These buffers are registered as ct.StateType inputs/outputs during coremltools conversion so the Swift runtime can handle them efficiently as MLState opaque handles.
INT4 Quantization: The linear layers have been quantized to 4-bit to fit within strict iOS Jetsam limits on 8GB devices.

Usage in Swift

This model must be invoked using MLState instead of passing the caches explicitly:

import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndGPU // or .all, though ANE compile success may vary by iOS patch
let model = try await LFM2_5_1_2B_Stateful(configuration: config)

let state = model.makeState()

// Token generation loop
let input = LFM2_5_1_2B_StatefulInput(
    input_ids: currentTokenArray,
    cache_position: cachePositionArray,
    attention_mask: attentionMaskArray
)

let output = try await model.prediction(input: input, using: state)

Intended Use

This repository was compiled for use inside iMLX (an experimental local inference chat app for iOS). It includes the original Hugging Face tokenizer.json and a specific model_config.json designed for the app's ModelDownloadService.

Downloads last month: 12