What This Is.
a free, in-browser voice activity detection tool. it analyzes an audio file and shows you a visual timeline of speech vs. silence, with exact timestamps for every segment.
it uses ffmpeg's silencedetect filter (threshold: -30 dB, minimum duration: 0.5s) to identify silence, then inverts that into speech regions. the result is a visual map of your recording's structure.
Nothing Uploads.
the analysis runs entirely in your browser via ffmpeg.wasm. the file you drop never leaves your machine. no server involved, no network request during analysis.
How To Use It.
- drop your audio file into the target above.
- hit "detect speech." the tool analyzes the file and builds a timeline.
- read the visual bar — dark regions are speech, light regions are silence. hover for exact timestamps.
- expand "all segments" for a full table with start, end, and duration for every speech and silence segment.
What This Is Useful For.
- interview QC. spot long gaps in a recording that might indicate a dropped connection or a question the mic didn't pick up.
- pre-trim scouting. see the silence structure before deciding where to cut. pair with the trim tool or auto-trim once you know what to remove.
- episode planning. understand how much of a raw recording is actual content vs. dead air, before committing to a full edit pass.
- transcription prep. know the speech-to-silence ratio before running a transcription job — useful for estimating cost on per-minute services.
Limits.
- 1 GB file cap. ffmpeg.wasm runs in browser memory.
- desktop browsers only. chrome, edge, firefox, safari, arc.
- amplitude-based detection. this is not ML-based speaker diarization. it detects silence by volume threshold, not by recognizing voices. background noise above the threshold reads as "speech." for noisy recordings, the results will overcount speech.