Real Time Transcription and Subtitles


Project information

  • Category: Application
  • Project date: April, 2024
  • Languages/Libraries: Python (OpenAI, OpenCV, Sockets, Threading)

Summary: An innovative application designed to enhance communication accessibility, particularly for those with hearing impairments. The essence of our project is a real-time transcription and subtitle system that identifies and transcribes spoken words during live conversations.

The system we engineered utilizes OpenAI's Whisper API for accurate speech-to-text services. To pair the transcribed text with the correct speaker, we integrated OpenCV for facial recognition and movement tracking. This not only distinguishes who is speaking but also tracks the speaker's mouth movements to sync the transcriptions dynamically.

In developing this tool, we focused on creating a seamless user experience where the transcribed text is displayed above the speaker's face in the video feed. This approach ensures clear communication without the need for additional hardware.

Throughout the development process, we refined our prototype through user feedback, focusing on improving readability and minimizing visual obstruction. Our commitment to innovation has set the stage for future enhancements, including more sophisticated speaker detection, expanded language capabilities and collaboration with meeting platforms such as Zoom.