OpenAI Whisper.
A neural speech-to-text model from OpenAI, trained on 680,000 hours of multilingual audio. Strong out-of-the-box accuracy on clear English audio and decent accuracy across many other languages.
How Screendog uses it
After upload, Screendog extracts a 16 kHz mono opus track via ffmpeg (≈5 MB/hr) so multi-hour recordings fit comfortably under our 300 MB transcription request cap. The extracted audio is sent to the `gpt-4o-mini-transcribe` endpoint.
When transcription fails honestly
If audio is too quiet, the recording is muted, or extraction fails, the transcript status is set to `no_speech`, `too_large`, or `failed` — never silently dropped.
Frequently asked
Is the audio sent to OpenAI?
Yes — transcription is performed by OpenAI's Whisper endpoint via the Replit AI Integrations proxy. No audio is retained beyond the request lifetime by Screendog.
Try Screendog free.
5 recordings on the free trial. Real Linear, GitHub, Notion, Slack, and Jira filing. No credit card.
Start a workspace — free