Glossary · Whisper

OpenAI Whisper.

A neural speech-to-text model from OpenAI, trained on 680,000 hours of multilingual audio. Strong out-of-the-box accuracy on clear English audio and decent accuracy across many other languages.

Start a workspace — free See pricing

How Screendog uses it

After upload, Screendog extracts a 16 kHz mono opus track via ffmpeg (≈5 MB/hr) so multi-hour recordings fit comfortably under our 300 MB transcription request cap. The extracted audio is sent to the `gpt-4o-mini-transcribe` endpoint.

When transcription fails honestly

If audio is too quiet, the recording is muted, or extraction fails, the transcript status is set to `no_speech`, `too_large`, or `failed` — never silently dropped.

Frequently asked

Is the audio sent to OpenAI?

Yes — transcription is performed by OpenAI's Whisper endpoint via the Replit AI Integrations proxy. No audio is retained beyond the request lifetime by Screendog.

Try Screendog free.

5 recordings on the free trial. Real Linear, GitHub, Notion, Slack, and Jira filing. No credit card.

Start a workspace — free

How Screendog uses it

When transcription fails honestly

Frequently asked

Related reading

Try Screendog free.