A CLI that turns long audio recordings into clean, readable prose markdown. Uses Qwen3-ASR served through vLLM, with optional word-level timestamps via the matching forced aligner.
Can also use pyannote diarization - to split text by speaker with --diarize flag.
Built around a single-GPU, in my case 24 GB workstation with RTX 3090. On that hardware it transcribes at roughly 150–250× realtime — a 51-minute recording finishes in about 16 seconds after the model is loaded.

