VoiceBox NEW

Voice · Premium tool

Premium

Visit this site

0.00

Based on 0 Reviews

0.00%

Quick Facts

Category: Voice
Pricing: Premium
Listed: Jun 2026
Updated: Jun 2026
Website: voicebox.sh

About VoiceBox

Voicebox is an open-source desktop app for voice cloning and text-to-speech on macOS, Windows, and Linux.It clones voices from short samples (around 3 seconds) and generates speech across multiple TTS engines, accepting WAV, MP3, FLAC, and WEBM uploads as well as microphone and system-audio capture.

A timeline-based multi-voice editor supports arranging tracks, trimming clips, mixing conversations, and applying audio effects (pitch shift, reverb, delay, compression) with live preview and per-profile presets.

Local inference runs on Metal, CUDA, ROCm, Intel ARC, and DirectML or on remote GPUs via one-click server setup to enable local and offline workflows.Speech-to-text uses Whisper models across sizes and 99 languages, with optional local LLM transcript refinement for punctuation and disfluency removal.

Generation supports long outputs (up to 50,000 characters per request) with automatic chunking and seamless crossfades between segments.Developer-focused MCP integration exposes a voicebox.speak API for agents, scripts, and toolchains, making the tool suitable for creators, podcasters, voice artists, accessibility users, writers, and developers.

Key Features

Voice cloning from short audio samples (≈3 seconds)
Multi-engine TTS accepting WAV/MP3/FLAC/WEBM uploads plus microphone and system-audio capture
Timeline-based multi-voice editor for arranging tracks, trimming, mixing, and applying audio effects (pitch shift, reverb, delay, compression) with live preview and per-profile presets
Local inference on Metal, CUDA, ROCm, Intel ARC, and DirectML, with remote GPU support via one-click server setup for local/offline workflows
Speech-to-text using Whisper models (multiple sizes, 99 languages) with optional local LLM transcript refinement for punctuation and disfluency removal

Use Cases

Produce professional audiobooks and long-form narrated content using Voicebox by cloning a narrator’s voice from a short sample, generating natural-sounding long-form TTS locally or via remote GPU inference, and exporting finished files as WAV/MP3/FLAC for publishers
Assemble multi-voice podcasts, audioplays, and marketing voiceovers in Voicebox’s timeline editor—capture mic takes, clone guest voices from short samples, apply effects and seamless multi-track editing, then mix and export episodes without relying on cloud services for privacy
Integrate Voicebox’s TTS and Whisper STT into apps and workflows via its API to build multilingual voice assistants, automated transcription pipelines, or data-sensitive IVR systems that run offline or on remote GPUs while supporting mic capture and common audio formats

Who is it for?

Content creators
Voice artists
Writers
Developers
Accessibility enthusiasts

Published by Ai Directory Platform

Last Updated 21 Jun 2026

Category Voice

Our team independently researches AI tools, verifies official sources, and publishes user reviews. Ratings reflect real user feedback. We may earn affiliate commissions — this does not affect our editorial ratings.

No review yet!

More Voice AI Tools

Explore other voice tools with user ratings, pricing details, and in-depth descriptions. Updated regularly by our editorial team.

Convertigo

No-code

Convertigo is an open-source, AI-assisted low-code and pro-code platform for building enterprise web...

Premium

FreeMusicCreator AI

Music

FreeMusicCreator.ai is AI music generator for producing royalty-free songs, instrumentals, and edita...

Premium Free Trial

MailShake

Sales

Mailshake is an AI sales engagement and B2B lead generation platform that centralizes cold email out...