what this is

a free, in-browser tool that pulls the audio track out of a video file and gives you back a standalone mp3. it works with mp4, mov, webm, mkv, m4v, avi, flv — any container with an audio stream that ffmpeg can read.

the audio quality is set to a variable-bitrate mp3 around 165 kbps, which is the sweet spot for transcription source audio: high enough to preserve speech detail, low enough that file sizes stay manageable. (if you want a different format or bitrate, the rest of the format-converter tools handle that.)

nothing uploads

most "extract audio from video" tools are server-side services. you upload your video, their server processes it, you download the audio. for most files that's fine — but for sensitive material (interviews under embargo, client-confidential video, off-the-record recordings, audio you'd rather not have copies of on a third-party server), it isn't.

this tool runs ffmpeg.wasm directly in your browser. the video file you drop is read into the browser's memory, decoded locally, and the audio is re-encoded locally. no network request happens at any point during the conversion. you can verify this in your browser's network tab — and we wrote a five-minute walkthrough on doing exactly that audit.

how to use it

drop your video file into the target above (or click to choose).
the first time you use any tool on this page, ffmpeg.wasm loads from a CDN — about 30MB, cached afterwards. one-time cost; subsequent visits are instant.
the tool extracts the audio and offers an mp3 download. the file lives on your machine; nothing reaches us.

limits

1 GB file cap. ffmpeg.wasm runs in browser memory; very large files run out of address space. for longer-than-2-hour 4K video, you'll want to trim or downsample first.
desktop browsers only. chrome, edge, firefox, safari, arc — all current versions work. mobile browsers don't have the WASM SharedArrayBuffer support reliably available.
processing time scales with file length. expect roughly 1× to 2× real-time for the conversion on a modern laptop — a 30-minute video takes 30-60 seconds. older or low-power machines run slower.

what to do with the audio

depends on the job. common next steps:

transcribe it. audiohighlight's transcription workspace is the obvious follow-on. drop the mp3 in for a transcript with bulk speaker-fix and click-word-to-replay-audio.
edit it in audacity, garageband, or descript. the mp3 imports cleanly into any DAW or editing tool.
archive it. a stripped-out audio track is much smaller than the source video and easier to back up.
publish it as a podcast episode. if your video is a recorded conversation, the audio track is often a publishable podcast as-is.

about the larger chain

this tool is one link in a chain we're building: ffmpeg → LLM → time-aligned transcript → remotion. ffmpeg gives us local file processing. an LLM layer adds semantic judgment to the structured output (silence regions, loudness curves, scene cuts). the transcript ties the audio to time-anchored words. and remotion turns the whole thing into rendered video output — captioned clips, highlight reels, audiograms — all without uploading anything.

extract-audio-from-video is the first link, the simplest one, and the one almost everyone needs at some point. the rest of the chain ships in the coming weeks. join the waitlist if you want the working build the day each piece is ready.