how to transcribe a Zoom recording — without a bot in your meeting

two ways Zoom recordings exist

when you record a Zoom meeting, the file lands in one of two places, and the transcription workflow differs slightly:

cloud recording. saved to Zoom's servers. you download it from the "recordings" section of your Zoom account. it arrives as an mp4 (video + audio) and an m4a (audio-only). either file works for transcription; if you only want the transcript, the audio-only m4a is smaller and faster.
local recording. saved to the host's machine. zoom drops the audio into a folder named with the meeting timestamp. you'll find it as audio_only.m4a alongside the video file.

either kind of recording is just an audio file once it exists. drop it in, transcribe it, edit, export.

the workflow

locate the recording. for cloud recordings: zoom.us → recordings → download. for local recordings: ~/Documents/Zoom/[meeting-name]/ on mac, %USERPROFILE%\Documents\Zoom\ on windows.
drop the file into audiohighlight. mp4, m4a, mp3, wav, webm — anything Zoom exports works. video files have audio extracted automatically.
transcription runs. on a 60-minute zoom recording, the first pass is ready in 1–3 minutes (cloud mode) or roughly real-time (on-device private mode for sensitive meetings).
fix the speaker labels. zoom doesn't pass speaker identity through to the file — the diarization is what we infer from voice patterns. relabel "speaker 1" to the actual person's name, once. propagates through every row.
verify quotes against the recording. click any word in the transcript, hear that second of audio. for any meeting whose transcript becomes evidence — performance reviews, candidate interviews, product decisions, customer-success cases — this is the verification step.
export. .docx for the meeting notes that go to the team. .srt or .vtt for adding captions to the recording before sharing. plain text for paste-into-doc workflows.

why no bot

the dominant workflow for zoom transcription in 2026 is a bot — otter, fireflies, fathom, granola — that joins your meeting as a participant and transcribes live. for many internal team meetings, that's a perfectly good choice. for a meaningful subset of meetings, it isn't:

candidate interviews where some candidates decline if a bot is present, and jurisdictions with two-party consent rules complicate the transcript's usability
medical and therapy consultations where HIPAA-bound audio shouldn't pass through a third-party bot
legal consultations and witness preparation where attorney-client privilege is at risk if a bot is in the room
journalism interviews with sources who agreed to talk on the condition of no third-party recording
internal investigations and HR consultations where audio handling has legal implications
m&a, board, and strategy meetings where the audio's existence on a third-party server is itself a leak risk

for any of those, the workflow that works is: you record the call yourself (zoom's local recording does this), and you transcribe the file after, using a tool that doesn't need to be in the meeting.

private mode for sensitive Zoom recordings

for the meetings above — medical, legal, journalism, investigation, board — even uploading the recording to a cloud transcription tool after the fact can be a problem. the audio sits on the vendor's servers; it's reachable through process the way any vendor-held document is reachable.

private mode runs the speech-recognition model in your browser using WebGPU. you drop the recording into the editor and the model transcribes locally — your audio never makes a network request, never reaches our servers, never sits in any third-party storage. for the structural argument and the audit instructions, see private transcription.

handling Zoom's quirks

voice cuts and crosstalk. zoom's audio compression occasionally clips voices when speakers overlap. the diarization can read this as a third speaker; flag those rows during cleanup.
screen-sharing audio. if someone shared a video or audio clip during the call, it appears in the transcript as additional speakers. the editor lets you mark those passages or trim them before export.
echo / feedback rooms. participants without headphones can create echo that the model transcribes as repeated phrases. usually fixable in the cleanup pass; for systematic echo, the recording is difficult to transcribe accurately on any tool.
multiple speakers per machine. two people sharing one camera often get diarized as a single speaker. relabel manually after the first pass.

pricing for Zoom recordings

$0.25 per minute. a 30-minute zoom call is $6. a 60-minute team meeting is $15. private mode and cloud mode are the same price. no subscription, no minimum. for teams with steady weekly meeting volume, batch pricing arrives after launch.

transcribe a Zoom recording without putting a bot in your meeting.

two ways Zoom recordings exist

the workflow

why no bot

private mode for sensitive Zoom recordings

handling Zoom's quirks

pricing for Zoom recordings

related

vs otter

private transcription

transcribe an interview

audit a browser tool

lifetime deal while we're in beta.