Video to Text
Extracts the audio and runs OpenAI Whisper in your browser to transcribe the speech, with timestamps.
What this does
Video to Text extracts the audio from your video, then runs OpenAI's Whisper (base) model on it through Transformers.js — all in your browser, with no upload.
The first time you click Transcribe, the model (~80MB) downloads from the Hugging Face CDN and is cached, so later runs skip the download. Transcription runs in a background worker, so the page stays responsive. Links to the model and libraries are below.
How it works
- 1Drop a video (MP4, MOV, WEBM, MKV…).
- 2Click Transcribe. The audio is extracted and, on first run, the Whisper model downloads once (~80MB) and caches.
- 3Read the transcript, then copy it or download it as .txt or .srt subtitles.
Built with open source
- Transformers.js — Hugging Face's library for running ONNX machine-learning models in the browser, on WebGPU or WebAssembly. The model weights download from the Hugging Face CDN on first use and are cached. · Apache-2.0
- Whisper base (OpenAI) — OpenAI's multilingual speech-recognition model, transcribing audio to text. · MIT
- Mediabunny — Converts and edits video and audio in the browser via WebCodecs. Add-on encoders cover MP3, AAC, and FLAC. · MPL-2.0
Frequently asked questions
Related tools
All Convert video →MP4 ConverterConvert MOV, WEBM, MKV, and more to MP4 with H.264 video and AAC audio.WEBM to MP4Convert WEBM video to MP4. Re-encodes to H.264 video and AAC audio for the MP4 container.MKV to MP4Convert MKV (Matroska) video to MP4. Copies a matching H.264 stream, or re-encodes to H.264 and AAC.Video ConverterConvert video between MP4, MOV, WEBM, and MKV. Pick a format and convert in one step.MOV to MP4Convert QuickTime MOV video to MP4. When the streams are already H.264/AAC they're copied without re-encoding.Video to GIFTurn a video clip into an animated GIF. You set the frame rate and width; GIF maps to 256 colors.
