
OpenAI has just launched a suite of new audio models and tools designed to make AI agents more reliable, accurate, and flexible. The announcement includes two new speech-to-text models, an innovative text-to-speech model, and updates to the Agents SDK to streamline the creation of voice-enabled applications.
SOTA Speech-to-Text Models
The new GPT-4o Transcribe and GPT-4o Mini Transcribe models promise superior performance compared to previous models like Whisper. These models were trained on extensive datasets, incorporating the latest advancements in technology and model architecture.
- GPT-4o Transcribe: Designed for high accuracy, this model leads the market in transcription quality across numerous languages.
- GPT-4o Mini Transcribe: A smaller, more efficient model that retains excellent transcription capabilities while offering faster processing and lower costs.
Both models are available via the API, with GPT-4o Transcribe priced at $0.06 per minute and GPT-4o Mini Transcribe at $0.03 per minute. OpenAI is also enhancing its speech-to-text APIs with streaming capabilities, noise cancellation, and a semantic voice activity detector.
Expressive Text-to-Speech with GPT-4o Mini TTS
The new GPT-4o Mini TTS model allows developers to control not just what the model says but how it says it. This model supports various voices and can be prompted to deliver text with specific tones, emotions, and styles.
The model is available in the API for just $0.01 per minute. Developers can experiment with the model on openai.fm, where they can choose from pre-generated prompts or create their own.
Streamlined Voice Agent Development with Agents SDK
OpenAI is also releasing an update to its Agents SDK that simplifies the process of converting text-based agents into voice agents. The SDK encapsulates best practices for building reliable agents, including guardrails and function calls. With just a few lines of code, developers can integrate speech-to-text and text-to-speech capabilities into their existing agents.
How to Get Involved
To celebrate the launch, OpenAI is hosting various resources and demos for developers. Participants can explore the new models through OpenAI’s API documentation and the Agents SDK guide.
References
- https://indianexpress.com/article/technology/artificial-intelligence/openai-unveils-new-audio-models-to-redefine-voice-ai-with-real-time-speech-capabilities-9897908/
- https://platform.openai.com/docs/guides/voice-agents
- https://www.youtube.com/watch?v=YYgnDO4prdA
- https://community.openai.com/t/new-audio-models-in-the-api-tools-for-voice-agents/1148339
- https://techcrunch.com/2025/03/20/openai-upgrades-its-transcription-and-voice-generating-ai-models/
- https://note.com/npaka/n/n1332bece07e7
- https://venturebeat.com/ai/openais-new-voice-ai-models-gpt-4o-transcribe-let-you-add-speech-to-your-existing-text-apps-in-seconds/
- https://platform.openai.com/docs/guides/agents