AI

Introducing Moshi: The New AI Chatbot with Advanced Voice Recognition

07 July 2024

|

Zaker Adham

Summary

Kyutai, a French AI company, has unveiled its latest innovation, an AI-powered chatbot named Moshi. This new chatbot boasts features akin to ChatGPT’s delayed 'Advanced Voice Mode' GPT-4o. Notably, Moshi can recognize and interpret the user's tone of voice and operate offline.

Revolutionary Features of Moshi Moshi is built on a robust 7B parameter large language model (LLM) called Helium. It is currently available to the public and can mimic various accents and 70 different emotional speaking styles. One of its standout capabilities is handling two audio streams simultaneously, allowing it to listen and speak at the same time.

Named after the Japanese phone greeting, Moshi responds in just 200 milliseconds, surpassing the 232 to 320 milliseconds response time of GPT-4o’s Advanced Voice Mode. Kyutai has trained Moshi to understand the subtleties of human conversation.

The company even collaborated with a professional voice artist to enhance voice quality. Developed by a team of eight researchers in just six months, Moshi is smaller in size and was trained on 100,000 synthetic dialogues using Text-to-Speech technology.

Future Prospects and Open Source Goals Kyutai aims to make Moshi an open-source project, ensuring users can utilize the chatbot without privacy concerns. Although faster than GPT-4o, Moshi is currently a research prototype showcasing rapid response times and the ability to replicate tones and voices. Kyutai is also developing an AI-powered audio identification, watermarking, and signature tracking system to integrate with Moshi. While it may not yet rival ChatGPT, Moshi represents significant progress in the development of offline, open-sourced AI models.