AI Tools
中文

Buzz Review: Running Whisper Locally for Audio Transcription, Is It Worth It?

Buzz is an open-source desktop app powered by OpenAI Whisper for offline audio transcription and translation. After a few weeks of using it for meetings and podcasts, here's my honest take.

WhisperSpeech RecognitionTranscriptionDesktop AppOpenAI

广告

Buzz Review: Running Whisper Locally for Audio Transcription, Is It Worth It?

I used to pay for a cloud transcription service to caption my podcast episodes. Aside from the monthly fee, every audio file had to be uploaded to someone else’s servers — and for interview content with sensitive material, that always made me uncomfortable. Then I stumbled across Buzz on GitHub, which packages Whisper as a desktop app that runs fully offline. I was sold immediately.

What It Actually Is

Buzz is a Python-based open-source desktop application by Chidi Williams, sitting at 18,943 stars on GitHub. At its core, it wraps OpenAI’s Whisper model in a graphical interface so non-technical users can transcribe audio without touching the command line.

It supports macOS, Windows, and Linux, ships under MIT license (commercial use is fine), and the last commit was late April 2026 — so it’s actively maintained.

What Genuinely Impressed Me

Fully offline, fully private. I record client interviews with sensitive content, and I’d never been comfortable uploading those to cloud transcription tools. Buzz runs the model locally, so audio never leaves my laptop. For me alone, that’s worth the price of admission — and Buzz is free.

Pick your model size. Whisper comes in tiny, base, small, medium, large — and Buzz supports them all. On my M2 MacBook Air, I run “small” for real-time English-to-text transcription on Mandarin podcasts, and accuracy lands around 92%. Bumping up to large-v3 gets you noticeably better results at the cost of speed; a 30-minute clip takes about 5 minutes to process on my machine.

Real-time transcription works. Buzz doesn’t just chew through pre-recorded files — it can hook into your system mic and transcribe live. I’ve used this in meetings a few times. It’s not perfect (occasional flaky words), but it’s serviceable, and exporting to SRT for subtitle work is dead simple.

Multilingual translation. Whisper natively supports translation between 99 languages, and Buzz exposes that. I once threw a Japanese podcast at it and asked for English text — surprisingly clean output. Faster than running it through Google Translate and proofreading manually.

Getting It Running

The path of least resistance is grabbing a release binary from GitHub. Mac and Windows both have ready-to-go installers. The interface is utilitarian — pick an audio file, choose a model size, set the output language, hit start. That’s it.

If you’d rather run from source:

git clone https://github.com/chidiwilliams/buzz
cd buzz
pip install poetry
poetry install
poetry run python -m buzz

First launch downloads your selected Whisper model. If your connection is sketchy, you can pre-download the model and drop it into ~/.cache/whisper. The large-v3 download took me 20+ minutes the first time; after that it’s cached.

A Few Things You Should Know Before Diving In

GPU support is fiddly. If you want to use an NVIDIA card, you’ll need to install CUDA and a matching PyTorch build yourself. Mac users get Metal acceleration with no setup. Linux users have it the worst — a friend of mine spent half a day getting his RTX 4060 to cooperate.

Long audio sometimes glitches. For files over two hours, the model occasionally gets stuck repeating the same line (the community calls this “hallucination”). The workaround is splitting with ffmpeg into 30-minute chunks before transcribing. This is a Whisper-level issue, not a Buzz problem.

The UI feels dated. This is an open-source side project, and the interface looks early-2010s. All the features are there, but the layout is cluttered — first-time users may struggle to find the live transcription option. A UI overhaul has been requested forever, but the project is essentially a one-person show.

Chinese punctuation drops. Whisper’s handling of Chinese routinely loses punctuation, requiring manual cleanup. Not Buzz’s fault, but worth knowing.

How It Compares

I’ve also tried MacWhisper (Mac-only, paid), Aiko (Mac-only, free, basic), and Whisper.cpp (CLI).

Buzz wins on cross-platform + full-featured + fully free. MacWhisper has a slicker UI but costs $30+. Aiko is free but only does basic transcription. Whisper.cpp is fast but not friendly for non-technical users.

OpenAI’s Whisper API is also an option — best accuracy, no hardware demands — but it costs $0.006 per minute, and you have to trust OpenAI not to peek at your audio.

Who Should Use It

Podcasters, content creators, journalists, students doing interview transcripts, or coworkers handling meeting minutes — Buzz can save real money and time. Especially for sensitive content, the local-first approach beats every cloud service.

If you only need to transcribe the occasional voice note, cloud services may be more convenient. Downloading multi-GB models for sporadic use isn’t worth the friction for everyone.

Bottom Line

Buzz isn’t a flashy AI product. It just takes Whisper — an excellent open model — and puts it on your desktop in the most direct way possible. The 18.9k stars are earned: it solves a real problem with “good enough, easy to install, free.” It has replaced my paid transcription service entirely, and I’ve recommended it to several content-creator friends with positive feedback.

One tip: don’t stick with base or small if your machine can handle medium. The accuracy jump is significant.

GitHub: https://github.com/chidiwilliams/buzz


About the Author

Liudingyu is a full-stack developer and heavy GitHub user. With 900+ starred repos over the past 3 years, this site only covers tools I’ve actually used or deeply researched.

📧 Found a great tool to recommend? Email [email protected]

广告

Related Posts