AI

Gemini’s Data Analysis Abilities Overstated by Google

30 June 2024

|

Paikan Begzad

Summary

Google has touted its generative AI models, Gemini 1.5 Pro and 1.5 Flash, for their purported ability to handle and Analysis vast amounts of data. The tech giant has highlighted their "long context" capabilities in press briefings and demonstrations, showcasing tasks like summarizing extensive documents or searching through film footage.

However, recent research casts doubt on these claims. Two separate studies evaluated how effectively Google’s Gemini models and others process large datasets, such as lengthy texts comparable to "War and Peace." The findings reveal that Gemini 1.5 Pro and 1.5 Flash often falter, answering correctly only 40% to 50% of the time in document-based tests.

"While Gemini 1.5 Pro can technically handle long contexts, many instances show that the models don’t genuinely ‘understand’ the content," stated Marzena Karpinska, a postdoctoral researcher at UMass Amherst and co-author of one of the studies, in an interview with TechCrunch. The context, or context window, refers to the input data the model considers before generating output. This can range from simple questions to movie scripts or audio clips. As these context windows expand, so does the volume of documents they encompass.

The latest Gemini versions can process up to 2 million tokens as context — equivalent to about 1.4 million words, two hours of video, or 22 hours of audio, making it the largest context capacity of any commercially available model.

Earlier this year, Google showcased several pre-recorded demos to illustrate Gemini’s long-context capabilities. One demo featured Gemini 1.5 Pro searching the Apollo 11 moon landing transcript — roughly 402 pages — for humorous quotes and then identifying a scene in the telecast resembling a pencil sketch.