Voxtral Transcribes at the Speed of Sound: Introducing Voxtral Transcribe 2
Today, we're thrilled to unveil Voxtral Transcribe 2, a groundbreaking leap in speech-to-text technology. This release introduces two cutting-edge models: Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for real-time applications. Voxtral Realtime is open-source under the Apache 2.0 license, offering unparalleled flexibility and control.
We've also launched an interactive audio playground in Mistral Studio (https://console.mistral.ai/build/audio/speech-to-text) that allows you to test transcription instantly, complete with diarization and timestamps, powered by Voxtral Transcribe 2.
Here's a breakdown of the key features:
Voxtral Mini Transcribe V2
- State-of-the-art Transcription: Achieves industry-leading accuracy with speaker diarization, context biasing, and word-level timestamps in 13 languages.
- Lowest Word Error Rate: Voxtral Mini Transcribe V2 boasts the lowest word error rate at the lowest price point, outperforming competitors.
- Multilingual Support: Supports 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.
Voxtral Realtime
- Real-Time Transcription: Purpose-built for low-latency applications, achieving sub-200ms latency, ideal for voice agents and real-time applications.
- Multilingual Excellence: Multilingual, achieving strong transcription performance in 13 languages.
- Edge Deployment: Deployable on edge devices for privacy-first applications.
Best-in-Class Efficiency
- Industry-Leading Accuracy: Offers industry-leading accuracy at a fraction of the cost, outperforming competitors in transcription quality.
- Lowest Price Point: Voxtral Mini Transcribe V2 offers the best price-performance ratio in the market.
Transforming Voice Applications
Voxtral empowers a wide range of voice applications across diverse industries:
- Meeting Intelligence: Transcribes multilingual recordings with speaker diarization, enabling accurate meeting content annotation.
- Voice Agents and Virtual Assistants: Enables conversational AI with sub-200ms transcription latency for natural voice interfaces.
- Contact Center Automation: Real-time transcription for sentiment analysis, response suggestions, and CRM field population.
- Media and Broadcast: Live multilingual subtitle generation with minimal latency and context biasing for technical terms.
- Compliance and Documentation: Regulatory compliance monitoring and transcription with clear speaker attribution and precise audit trails.
Get Started
Voxtral Mini Transcribe V2 is available now via API at $0.003 per minute. Voxtral Realtime is available via API at $0.006 per minute and as open weights on Hugging Face.
Explore the full capabilities of Voxtral Transcribe 2 and Mistral's audio transcription features in our documentation: https://docs.mistral.ai/capabilities/audio_transcription
Join Our Team: We're hiring passionate individuals to build world-class speech AI. Apply now: https://mistral.ai/careers