2026-05-12 16:06:02 -07:00
2026-05-12 16:06:02 -07:00
2026-05-12 16:06:02 -07:00

talkie on Mac

Running talkie-1930-13b-it — a 13B vintage language model trained on pre-1931 English text — locally on Apple Silicon.

Hardware

  • Mac M4, 48 GB unified memory
  • ~200 GB free disk space
  • macOS (darwin)

Stack

  • omlx — provides the Python/MLX runtime (bundled at /opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python)
  • talkie-mlx — custom inference runtime for talkie's non-standard architecture
  • talkie-1930-13b-it-mlx-q4 — 4-bit MLX quantization (~7.4 GB on disk)

Quick Start

# Interactive chat
/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py chat \
  --model ~/models/talkie-1930-13b-it-mlx-q4

# Single generation
/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py generate \
  --model ~/models/talkie-1930-13b-it-mlx-q4 \
  --prompt "What were the causes of the Great War?" \
  --max-tokens 400 --temperature 0.7

Or set up an alias:

alias talkie='/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py'
talkie chat --model ~/models/talkie-1930-13b-it-mlx-q4

Performance

  • Model load: ~0.3s (after first load)
  • Decode: ~26 tok/s on M4 Max
  • Memory footprint: ~7.8 GB

Why not omlx serve?

omlx discovers the model fine but can't actually load it — talkie uses a custom architecture ("architecture": "talkie" in config.json) that omlx doesn't have a loader for. It crashes with KeyError: 'model_type' at inference time. The talkie-mlx runtime handles the custom architecture directly.

See Also

Description
No description provided
Readme 26 KiB
Languages
Markdown 100%