talkie on Mac

Running talkie-1930-13b-it — a 13B vintage language model trained on pre-1931 English text — locally on Apple Silicon.

Hardware

Mac M4, 48 GB unified memory
~200 GB free disk space
macOS (darwin)

Stack

omlx — provides the Python/MLX runtime (bundled at /opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python)
talkie-mlx — custom inference runtime for talkie's non-standard architecture
talkie-1930-13b-it-mlx-q4 — 4-bit MLX quantization (~7.4 GB on disk)

Quick Start

# Interactive chat
/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py chat \
  --model ~/models/talkie-1930-13b-it-mlx-q4

# Single generation
/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py generate \
  --model ~/models/talkie-1930-13b-it-mlx-q4 \
  --prompt "What were the causes of the Great War?" \
  --max-tokens 400 --temperature 0.7

Or set up an alias:

alias talkie='/opt/homebrew/Cellar/omlx/0.3.8/libexec/bin/python ~/models/talkie-mlx/run.py'
talkie chat --model ~/models/talkie-1930-13b-it-mlx-q4

Performance

Model load: ~0.3s (after first load)
Decode: ~26 tok/s on M4 Max
Memory footprint: ~7.8 GB

Why not omlx serve?

omlx discovers the model fine but can't actually load it — talkie uses a custom architecture ("architecture": "talkie" in config.json) that omlx doesn't have a loader for. It crashes with KeyError: 'model_type' at inference time. The talkie-mlx runtime handles the custom architecture directly.

README.md