talkie on Mac
Running talkie-1930-13b-it — a 13B vintage language model trained on pre-1931 English text — locally on Apple Silicon.
Hardware
- Mac M4, 48 GB unified memory
- ~200 GB free disk space
- macOS (darwin)
Stack
- omlx — provides the Python/MLX runtime (bundled at
/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python) - talkie-mlx — custom inference runtime for talkie's non-standard architecture
- talkie-1930-13b-it-mlx-q4 — 4-bit MLX quantization (~7.4 GB on disk)
Quick Start
# Interactive chat
/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py chat \
--model ~/models/talkie-1930-13b-it-mlx-q4
# Single generation
/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py generate \
--model ~/models/talkie-1930-13b-it-mlx-q4 \
--prompt "What were the causes of the Great War?" \
--max-tokens 400 --temperature 0.7
Or set up an alias:
alias talkie='/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py'
talkie chat --model ~/models/talkie-1930-13b-it-mlx-q4
Performance
- Model load: ~0.3s (after first load)
- Decode: ~26 tok/s on M4 Max
- Memory footprint: ~7.8 GB
Why not omlx serve?
omlx discovers the model fine but can't actually load it — talkie uses a custom architecture ("architecture": "talkie" in config.json) that omlx doesn't have a loader for. It crashes with KeyError: 'model_type' at inference time. The talkie-mlx runtime handles the custom architecture directly.
See Also
- Setup.md — step-by-step instructions to reproduce this setup
- talkie blog post
- talkie GitHub
Description
Languages
Markdown
100%