Overview
Designed and built a real-time voice-to-voice translation system enabling live multilingual conversations during telehealth consultations. Language barriers significantly impact patient outcomes — patients who cannot communicate clearly with their providers receive worse care, miss important instructions, and are less likely to seek help when needed. This system eliminates that barrier with a streaming pipeline optimized for the latency demands of real clinical dialogue, allowing clinicians and patients to speak naturally in their own language with voice translation bridging the gap instantly.
The Challenge
Existing translation services introduce 2–5+ second round-trip latencies, completely breaking the natural conversational flow of a medical consultation. A patient answering a doctor's question and waiting several seconds for translation — and then hearing the translated reply several seconds later — creates an interaction so disjointed it is clinically unusable. Hospital environments add further acoustic challenges: medical equipment noise, diverse accents, and dense medical terminology that generic translation models handle poorly. The system also needed to handle conversational turn-taking without cross-talk artifacts, where one speaker's audio is picked up while the translated response is still playing.
Technical Approach
- Streaming speech-to-text with partial transcript processing — translation begins before the speaker finishes, rather than waiting for silence detection
- Real-time translation pipeline integrated mid-stream, producing translated segments as source speech arrives
- Text-to-speech synthesis with natural voice output tuned for intelligibility in medical contexts
- Livekit Agents framework for WebRTC session orchestration, enabling reliable low-latency audio transport between participants
- Livekit Meet for session management, participant coordination, and integration with existing telehealth workflows
- Conversational turn-taking logic to prevent cross-talk artifacts and manage the natural rhythm of bidirectional dialogue
- Optimized for clinical vocabulary including medical terminology, drug names, anatomy, and procedural language