Data Science Tech News

OpenAI Unleashes New Voice Models: GPT-4o Transcribe, Mini TTS, and Enhanced Agent SDK

OpenAI has just launched a suite of new audio models and tools designed to make AI agents more reliable, accurate, and flexible. The announcement includes two new speech-to-text models, an innovative text-to-speech model, and updates to the Agents SDK to streamline the creation of voice-enabled applications.

SOTA Speech-to-Text Models

The new GPT-4o Transcribe and GPT-4o Mini Transcribe models promise superior performance compared to previous models like Whisper. These models were trained on extensive datasets, incorporating the latest advancements in technology and model architecture.

  • GPT-4o Transcribe: Designed for high accuracy, this model leads the market in transcription quality across numerous languages.
  • GPT-4o Mini Transcribe: A smaller, more efficient model that retains excellent transcription capabilities while offering faster processing and lower costs.

Both models are available via the API, with GPT-4o Transcribe priced at $0.06 per minute and GPT-4o Mini Transcribe at $0.03 per minute. OpenAI is also enhancing its speech-to-text APIs with streaming capabilities, noise cancellation, and a semantic voice activity detector.

Expressive Text-to-Speech with GPT-4o Mini TTS

The new GPT-4o Mini TTS model allows developers to control not just what the model says but how it says it. This model supports various voices and can be prompted to deliver text with specific tones, emotions, and styles.

The model is available in the API for just $0.01 per minute. Developers can experiment with the model on openai.fm, where they can choose from pre-generated prompts or create their own.

Streamlined Voice Agent Development with Agents SDK

OpenAI is also releasing an update to its Agents SDK that simplifies the process of converting text-based agents into voice agents. The SDK encapsulates best practices for building reliable agents, including guardrails and function calls. With just a few lines of code, developers can integrate speech-to-text and text-to-speech capabilities into their existing agents.

How to Get Involved

To celebrate the launch, OpenAI is hosting various resources and demos for developers. Participants can explore the new models through OpenAI’s API documentation and the Agents SDK guide.

References

  1. https://indianexpress.com/article/technology/artificial-intelligence/openai-unveils-new-audio-models-to-redefine-voice-ai-with-real-time-speech-capabilities-9897908/
  2. https://platform.openai.com/docs/guides/voice-agents
  3. https://www.youtube.com/watch?v=YYgnDO4prdA
  4. https://community.openai.com/t/new-audio-models-in-the-api-tools-for-voice-agents/1148339
  5. https://techcrunch.com/2025/03/20/openai-upgrades-its-transcription-and-voice-generating-ai-models/
  6. https://note.com/npaka/n/n1332bece07e7
  7. https://venturebeat.com/ai/openais-new-voice-ai-models-gpt-4o-transcribe-let-you-add-speech-to-your-existing-text-apps-in-seconds/
  8. https://platform.openai.com/docs/guides/agents

Related Posts

How AI Transforms Nostalgic Hits Into Financial Assets

When Nostalgia Meets Algorithms: How AI Is Reshaping the Value of Your Favorite Old Songs You know that catchy chorus you can’t stop humming from the ’90s? The…

AI-Powered Assistants Transform Work and Future-Proof Careers

How AI-Powered Personal Assistants Are Reshaping Work—And What It Means for Your Career You’re drowning in Slack notifications, calendar invites, and unanswered emails. Meetings eat your afternoons. Your…

AI Covers of '90s Hits: Artists Face Royalty Battles & New Risks

AI Covers of ’90s Hits: Artists Face Royalty Battles & New Risks

Why Your Favorite ’90s Hit Could Soon Be an AI-Generated Cover — And What That Means for Artists You’ve probably hummed the chorus of Jill Sobule’s 1995 anthem…

AI Automation vs. AI Agents: Understanding the Key Differences

AI Automation vs. AI Agents: Understanding the Key Differences

Artificial intelligence (AI) is revolutionizing industries, but not all AI systems are created equal. Two prominent approaches; AI automation and AI agents, are reshaping workflows and decision-making processes….

Model Context Protocol (MCP): Revolutionizing AI Integration

Model Context Protocol (MCP): Revolutionizing AI Integration

In the rapidly evolving world of artificial intelligence, the Model Context Protocol (MCP) has emerged as a game-changing innovation. Introduced by Anthropic in late 2024, MCP is an…

OpenAI and Meta in Talks with Reliance for Groundbreaking AI Expansion in India

OpenAI and Meta in Talks with Reliance for Groundbreaking AI Expansion in India

In a move that could revolutionize India’s AI landscape, tech giants OpenAI and Meta are reportedly in separate discussions with Reliance Industries to expand their artificial intelligence offerings…

Leave a Reply

Your email address will not be published. Required fields are marked *