Transcribing With Whisper For People Afraid of the Command Line

Last updated August, 2024

I’m afraid of the command line. I can usually craft a command that does what I want – but I don’t need to work at that level often, and the skills quickly atrophy while my python installs grow old.

And I’m always afraid that a typo will do something Very Bad.

But sometimes there’s no other way. We do a ton of user research, and often need to transcribe those recordings for analysis. There’s plenty of good & affordable 3rd party options for that, but I don’t like sending recordings of participants onto someone else’s servers.

OpenAI’s Whisper is a great alternative that does all the work locally, with really impressive results. But… it runs at the command line. Which still intimidates me. What to do?

I’m Learning

I only kind of know what I’m doing with these tools, and this isn’t meant to be an exhaustive overview of how to use Whisper. These two methods have worked for me:

Option 1: StoryToolKitAI

StoryToolKitAI wraps a GUI around Whisper! It’s primarily designed to integrate transcriptions into video-editing tools, but you can ignore that part entirely.

On my very low-spec laptop, it transcribes 1 hour of audio in about 1 hour of actual time. All done locally.

Option 2: Google Colab

Google’s Colab gives you a hosted machine to play with code. The free tier will let you mess around and do some trial transcriptions, without bogging down your local CPU.

This obviously requires letting Google access your audio & transcription, but it’s also a dead-simple way to see how it all works.

Or maybe you just want transcriptions of your favorite podcast, or something else without privacy implications. It’s also fast – I transcribed 1 hour of audio in about 20 minutes.

I wrote an example Colab that walks you through the Whisper transcription process: Easy Transcripts With Whisper.