AI
First Impressions of Gemini Live: A Step Up from Siri, But Not Quite There Yet
14 August 2024
|
Zaker Adham
During the recent Made by Google event, Google introduced Gemini Live, a new feature that enables semi-natural spoken conversations with an AI chatbot powered by Google's latest large language model. TechCrunch had the opportunity to test it firsthand.
Gemini Live is Google's response to OpenAI's Advanced Voice Mode, a similar feature currently in limited alpha testing. Although OpenAI demonstrated their feature first, Google is the first to release a finalized version.
In my experience, these low-latency, verbal interactions feel more natural than texting with ChatGPT or conversing with Siri or Alexa. Gemini Live responds to questions in under two seconds and can quickly adapt when interrupted. While not perfect, it offers the best hands-free phone experience I've encountered.
How Gemini Live Works
Before using Gemini Live, you can choose from 10 different voices, compared to OpenAI's three. Google collaborated with voice actors to create these voices, each sounding remarkably human.
For example, a Google product manager asked Gemini Live to find family-friendly wineries near Mountain View with outdoor areas and playgrounds. This is a more complex query than I'd typically ask Siri or even Google Search, but Gemini successfully recommended Cooper-Garrod Vineyards in Saratoga.
However, Gemini Live has its flaws. It incorrectly identified a nearby playground as Henry Elementary School Playground, which is actually over two hours away. There is a Henry Ford Elementary School in Redwood City, but it's 30 minutes away.
Google showcased how users can interrupt Gemini Live mid-sentence, allowing for conversational control. In practice, this feature isn't flawless. Sometimes, the AI and users talked over each other, and the AI didn't always catch what was said.
Notably, Google restricts Gemini Live from singing or mimicking voices outside the provided 10, likely to avoid copyright issues. Additionally, Google isn't focusing on emotional intonation in user voices, unlike OpenAI's demo.
Overall, Gemini Live offers a more natural way to explore topics than a simple Google Search. Google mentions that Gemini Live is a step towards Project Astra, a fully multimodal AI model introduced during Google I/O. Currently, Gemini Live supports voice conversations, with plans to add real-time video understanding in the future.