Text to Speech Multilingual v2 helps you create usable drafts quickly, then refine quality, timing, and style with small, predictable edits. Use it for creator content, localization, podcasts, games, and production workflows where speed and consistency matter.
Create natural speech across languages with multilingual voice support.
The text to convert to speech Max length: 5000
The voice to use for speech generation
Voice stability (0-1)
Similarity boost (0-1)
Style exaggeration (0-1)
Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality.
Whether to return timestamps for each word in the generated speech
The text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation. Max length: 5000
The text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation. Max length: 5000
Language code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement. For other models, an error will be returned if language code is provided. Max length: 500
ai.voice.generator.results_empty
Built for real creator workflows where speed and control matter. Start with a clear goal, then iterate with small edits so improvements are easy to compare and keep.
Turn scripts into speech that sounds human, with controllable pacing and tone for narration, ads, support flows, and product demos.
Localize content across languages while keeping a consistent voice direction, then fine-tune speed and emphasis for each locale.
Adjust stability, similarity, and style so revisions stay predictable and easy to compare across iterations.
Draft quickly, review pronunciation and pacing, then rerender final takes once the direction is approved.
Designed to reduce rework by making outputs easier to predict, compare, and refine across iterations. Move from concept to usable drafts faster with practical control over quality and consistency.
Everyday capabilities for ideation and production: guided generation, controllable settings, and practical outputs that help you iterate quickly and ship usable results.
Choose a voice that matches your brand or character, then reuse it for consistent narration.
Control delivery by adjusting stability, similarity, and style exaggeration.
Tune speaking speed for accessibility, pacing, and platform requirements.
Enable timing output when available to support subtitles and editing workflows.
Use a repeatable structure so results are easier to compare across iterations.
Generate 3–5 variants, pick the best, then refine with small edits.
Create outputs you can share for review before finalizing.
Keep prompts and outputs together so you can reproduce your best results later.
Common questions about quality, workflow, credits, privacy, and best practices.
Create a first draft quickly, then refine quality and timing with fast iterations.