Voice generation FAQ
Yes, completely free. It runs in your browser, so there's no server cost — no usage caps, no time limits, and commercial use is welcome.
No. Your reference audio, input text, and generated audio are all processed in the browser. Nothing is uploaded to AudioBuff or to Resemble.
At least 10 seconds, with 15–20 seconds recommended. Below 10s the voice characteristics extract unreliably; quality plateaus past 20s. Record in a quiet space at a natural pace for best results.
English only for now (Beta). We're tracking Resemble's Chatterbox Multilingual (23 languages, released December 2025) and will adopt it once the Transformers.js port lands.
Yes. Every clip carries the Resemble Perth neural watermark — inaudible but recoverable after MP3 compression and resampling, so any clip can be technically confirmed as AI-generated. The watermark is always-on and cannot be disabled in AudioBuff.
Not currently. iOS Safari and macOS Safari are unsupported because of a known WebKit JSEP bug that crashes the runtime. Use a recent Chrome, Edge, or Firefox instead. WebGPU is preferred but a WebAssembly fallback runs everywhere else.
About 1.5GB on first load, then cached in your browser's IndexedDB so subsequent runs work offline. You can clear the cache from the model loader at any time.
Yes. After generation, click "Continue in audio editor" to hand the result off to the EQ / loudness normalization / MP3 export page. Cloning, finishing, and final delivery all happen in the same browser tab.
Only your own voice or audio you have explicit permission to use. Cloning another person's voice without consent may violate laws like the US ELVIS Act, and the responsibility falls on you. If you publish in the EU, AI-synthesis disclosure is also required.