gafiatulin commited on
Commit
5a60a5b
·
verified ·
1 Parent(s): d275e51

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - coreml
5
+ - tts
6
+ - vibevoice
7
+ - apple-silicon
8
+ - semantic-encoder
9
+ ---
10
+
11
+ # VibeVoice Semantic Encoder (CoreML)
12
+
13
+ Streaming semantic encoder for [VibeVoice](https://huggingface.co/microsoft/VibeVoice-1.5B) TTS, exported as a stateful CoreML MLPackage.
14
+
15
+ Shared between 1.5B and 7B models (identical encoder weights, 128-dim output).
16
+
17
+ ## Usage
18
+
19
+ Auto-downloaded by [vibevoice-mlx](https://github.com/gafiatulin/vibevoice-mlx) when CoreML is available:
20
+
21
+ ```bash
22
+ pip install mlx coremltools soundfile transformers huggingface_hub safetensors
23
+ git clone https://github.com/gafiatulin/vibevoice-mlx && cd vibevoice-mlx
24
+
25
+ # CoreML semantic encoder is auto-downloaded on first use
26
+ python run/e2e_pipeline.py --model microsoft/VibeVoice-1.5B --text "Hello!" --output hello.wav
27
+ ```
28
+
29
+ Without CoreML (Linux, or no coremltools), the pipeline falls back to a pure MLX semantic encoder.
30
+
31
+ ## Architecture
32
+
33
+ - **Type**: Causal σ-VAE encoder with streaming conv caches
34
+ - **Input**: 3200 audio samples (one speech frame at 24kHz)
35
+ - **Output**: 128-dim semantic features
36
+ - **State**: 34 conv cache buffers (ct.StateType, requires iOS 18+)
37
+ - **Compute units**: CPU_AND_GPU (ANE not supported for stateful models)
38
+ - **Size**: 657 MB (fp16 weights)
39
+
40
+ ## Performance
41
+
42
+ | Backend | Latency | Pipeline RTF (1.5B INT8) |
43
+ |---------|---------|--------------------------|
44
+ | CoreML | 4.8ms/frame | 3.1x |
45
+ | Pure MLX | 11.5ms/frame | 2.6x |
46
+
47
+ ## Source
48
+
49
+ Built from [microsoft/VibeVoice-1.5B](https://huggingface.co/microsoft/VibeVoice-1.5B) using [vibevoice-coreml](https://github.com/gafiatulin/vibevoice-coreml) conversion scripts.