about the product
what's the largest file you can transcribe?
5 GB at launch. that's roughly an 8-hour single-channel WAV file or a 50-hour MP3 at 128kbps. files larger than that should be split — long files compound model errors and the editor gets sluggish past about 4 hours of audio. we don't recommend it.
what languages does it support?
english at full quality at launch. cloud mode also handles spanish, french, german, mandarin, japanese, portuguese, italian, dutch, polish, and russian — at quality lower than english but usable. private mode (on-device) is english-first and degrades faster on other languages because the on-device model is smaller.
if your audio is multilingual or code-switches mid-sentence, you'll get usable but not great results. we improve with every model release; for now, monolingual files are the safe bet.
how does it handle accents?
standard non-native and regional english accents (Indian, Scottish, southern US, Australian, west African) work fine. heavy accents combined with technical vocabulary or poor audio quality compound, and the cleanup tax goes up. we publish accent-specific cleanup numbers in the benchmark so you can calibrate.
can it handle multiple speakers?
two speakers reliably. three speakers usually. four-plus speakers: depends heavily on whether their voices are distinguishable and whether there's much crosstalk. for a focus group with five participants, expect to fix 20–30% of speaker labels manually. the bulk-fix workflow makes that fast; the model itself isn't perfect.
what file formats can I upload?
mp3, m4a, mp4 (audio extracted), wav, flac, opus, ogg, webm, aac, wma. video files have their audio extracted and processed. if ffmpeg can read it, we can read it.
can I edit the transcript after it's delivered?
yes. the editor is browser-based — speaker labels in bulk, word-by-word audio playback for verification, custom vocabulary that learns from your corrections, full text editing with undo/redo. exports update live as you edit.
about pricing
why $0.25 per minute?
it undercuts temi (the only other no-subscription pay-per-file tool) by 20% at every file size, beats sonix on uneven usage patterns, and covers our cost-of-delivery on cloud mode with margin. flat number across cloud and private mode. we may adjust at launch based on benchmark results, but $0.20 is the target.
is there a subscription?
no. pay per file. no monthly minimum. see pricing for the math.
refunds?
unusable transcripts (corrupted, wrong language, garbled past cleanup) are refunded within 14 days, no return form, no retention attempt. usable transcripts that needed cleanup are not refunded — that's true of every tool. the benchmark publishes our typical cleanup numbers so you can calibrate before you buy.
do you have a free tier?
your first transcription up to 5 minutes is on us at launch. after that, $0.25/minute. there's no recurring free tier — freemium structures bias the product toward upgrade-pressure features and we'd rather build for the buyer who pays.
about privacy and security
what's the difference between cloud mode and private mode?
cloud mode uploads your audio to our servers, transcribes it there, returns the result. retention defaults to 30 days unless you delete sooner; you can shorten to "delete on completion" in settings. encrypted in transit and at rest. it's the standard cloud-transcription posture, just with the retention defaults set to the friendlier value.
private mode runs the model in your browser using WebGPU. your audio never makes a network request. there's no server-side storage, no log, no third party in the chain. the editor and exports work identically to cloud mode. see private transcription for the structural argument and this post for how to verify the claim.
what's the hardware floor for private mode?
chrome, edge, or arc on a 2021-or-later mac (M1/M2/M3) or a current windows laptop with WebGPU support. firefox and safari are on the roadmap. you'll need 4GB of free RAM during transcription. on a fresh machine, the model file downloads once (~200MB) and is cached for future visits.
on older or unsupported hardware, fall back to cloud mode at the same price. private mode isn't gated to a specific tier; it's a setting.
is this HIPAA-compliant?
private mode removes the third-party-processor question — there is no third party, so no business-associate agreement is relevant. the rest of HIPAA (physical safeguards, access controls, retention policies) is the buyer's responsibility, as it would be with any tool. for cloud mode, we sign BAAs on request for licensed clinicians; write hello@audiohighlight.com.
we are not your lawyers; this isn't legal advice. for the full positioning, see for therapists.
can a court subpoena my transcripts?
if your transcript is on your local device, the same legal process that would reach any document on your device reaches it. we change one specific surface: cloud mode transcripts held on our servers are subpoena-reachable from us; private mode transcripts aren't, because they aren't on our servers. for the legal version of this argument see for lawyers.
do you train models on customer audio?
no. we don't train on customer data, period. cloud mode audio is processed for transcription and discarded per the retention policy. private mode audio doesn't reach us at all. we license our base models from public sources (whisper, then our derived fine-tunes on public corpora) and train on those.
about the company
who are you?
a small team that has spent years inside the transcription workflow — academic interview coding, journalism, podcast production. we know which corner-cases break which tools because we've broken them on real work. see about for the longer version.
when does the product ship?
when we can hit our cleanup-time target on the benchmark corpus. we're not putting a date on it because the date is load-bearing on a milestone we don't fully control. the waitlist gets the working build the day it can run a real file end-to-end.
how often do you email people on the waitlist?
when there's something real to send. one email when there's a working private-mode build. one when the editor is in beta. one when we're charging real money for the first time. no drip campaigns, no monthly newsletter, no "we miss you" mail.
still have a question?
write hello@audiohighlight.com. real human, single-day reply.