how to transcribe a podcast — show notes, captions, citations

three jobs, one transcript

a podcast transcript has to do three jobs simultaneously, and most generic tools make at least two of them painful:

show notes. published on the episode page. SEO-indexed. has to be clean enough to read — no "uh"s and "um"s, no transcript artifacts. paragraph breaks at logical thought boundaries, not every sentence.
accessibility captions. .srt or .vtt for the embedded player. timed to the audio with second-level accuracy. complete (every spoken line, including filler in some workflows) or clean (filler removed) depending on your editorial style.
citation-ready pull-out quotes. for promotional graphics, twitter, the show's substack. often the most important part — a great pull quote drives the episode's discoverability. and verifying that the quote is accurate before posting is non-negotiable.

the same transcript should serve all three. the editor should let you switch between them — clean or verbatim, paragraphed or row-per-turn, full transcript or just the highlighted quotes — without re-transcribing or re-formatting from scratch.

the workflow

upload the episode. mp3, m4a, wav, anything. for video podcasts the audio is extracted automatically. up to 5 GB per file (about 8 hours of single-channel audio).
transcription runs. on a 60-minute episode, the first pass is ready in 1–3 minutes (cloud mode) or roughly real-time (on-device private mode for sensitive episodes — embargoed announcements, off-the-record passages, privacy-conscious guests).
fix labels in bulk. "speaker 1" becomes "host" and "speaker 2" becomes the guest's name, once. propagates through every row in the transcript. proper nouns (company names, product names, people mentioned) fixed once and remembered across future episodes.
edit for show notes. optionally remove filler in one pass — the editor flags "uh," "um," "you know" patterns and lets you accept or reject in batch. paragraph breaks land on logical boundaries by default; adjust where editorial taste calls for it.
highlight pull-quotes. mark the lines that will become tweets or graphics. each highlighted quote exports with a timestamp link back to the audio so you can verify before posting. the editor's click-word-to-replay-audio feature is the verification tool.
export everything at once. show-notes .docx (or markdown for substack), .vtt for the episode player, .srt for the youtube version, json for the website's transcript page, plus a separate "highlights" file with the pull-quotes and their timestamps.

the privacy case for podcasters

most podcast transcription is fine on a cloud tool. some isn't:

embargoed announcements. a guest mentions a product launch under embargo. you transcribe today; the embargo lifts next week. uploading the audio to a vendor before the embargo is a leak risk you can avoid.
off-the-record passages. a guest says "this part is off the record" mid-interview. you trim it before publishing — but you transcribe the whole thing first to find the passage.
privacy-conscious guests. sources who agreed to the show on the condition the audio doesn't go to additional third parties.

for these episodes, run the file in private mode. the audio stays on your laptop, the transcript stays on your laptop, the export is local. no vendor in the chain.

show-notes formatting that gets indexed

search engines index podcast show notes. they also reward clean structure and penalize wall-of-text. our show-notes export uses paragraph breaks at logical thought boundaries (typically 30–90 seconds of audio per paragraph), bolded speaker turns where structurally meaningful, and a heading hierarchy you can edit in. the .docx imports cleanly into most CMS systems (wordpress, ghost, substack, podbean) without paragraph-style corruption.

the timestamps stay embedded. listeners reading along can click any timestamp to jump to that point in the embedded player.

captions: complete vs clean

accessibility captions ideally include every spoken word — that's the WCAG-AA standard. some podcasts publish "clean" captions (filler removed) for editorial reasons; others publish complete. our caption export supports both modes: .srt-complete (verbatim) and .srt-clean (filler removed, paragraph-level). pick one or export both.

pricing for podcasters

$0.25 per minute. a 30-minute episode is $6. a 60-minute episode is $15. private mode and cloud mode are the same price. no subscription, no minimum. for podcasts with steady weekly volume, batch pricing arrives after launch — write hello@audiohighlight.com and tell us your shape.

transcribe a podcast for show notes, captions, and citation.

three jobs, one transcript

the workflow

the privacy case for podcasters

show-notes formatting that gets indexed

captions: complete vs clean

pricing for podcasters

related

for podcasters (privacy)

SRT export

vs descript

benchmark

lifetime deal while we're in beta.