What This Is.
drop an audio file and get a transcript with word-level timestamps and speaker labels. once the transcript loads, click any word to jump to that moment in the audio. the current word highlights as the audio plays — a read-along view for reviewing recordings.
unlike the other tools on this site, this one does upload your audio. the file is sent to our transcription service for processing, then deleted. the transcript itself stays in your browser — it's not stored on our side.
How It Works.
- drop or choose an audio file. most formats work: mp3, wav, m4a, webm, ogg, flac, mp4.
- the file uploads to our server, which forwards it to the transcription engine. you'll see a progress bar during upload, then a "transcribing" indicator while it processes.
- transcription takes roughly as long as the audio itself. a 10-minute recording takes about 10 minutes.
- when it's done, you get the full transcript with a waveform player. click any word to seek. press space to play/pause. the active word highlights yellow as you listen.
What You Get.
- word-level timestamps. every word has a start and end time. click to seek.
- speaker detection. the transcript groups words by speaker. each speaker turn is labeled and color-coded in the waveform.
- confidence underlines. words the engine is less sure about get an orange underline — a signal to verify against the audio.
- keyboard shortcuts. space for play/pause, arrow keys for ±5s, j/k for ±10s, up/down for speed (0.5x–3x).
- download as .txt. export the transcript with speaker labels.
Privacy.
the audio file is uploaded to our server and forwarded to a third-party transcription service for processing. it is not stored after transcription completes. the resulting transcript exists only in your browser tab — closing the tab deletes it.
Limits.
- max file size: 2 GB.
- transcription speed depends on audio length — roughly real-time (a 30-minute file takes ~30 minutes).
- english works best. other languages may work but accuracy varies.