Google DeepMind has released Gemini Ultra 2.0, a substantially upgraded version of its flagship AI model that introduces real-time video understanding, enhanced code generation, and what the company describes as “world-class” performance across science and mathematics benchmarks. The announcement was made at Google’s annual I/O developer conference.
Real-Time Video Understanding
The headline feature of Gemini Ultra 2.0 is its ability to process and understand live video streams in real time. Users can point their phone camera at an object, a piece of code on a monitor, or a document, and ask questions that the model answers based on what it sees — without any latency perceptible to the human eye.
Google demonstrated the feature during the keynote by using it to tutor a student through a calculus problem written on a whiteboard, identify and explain a rare plant species from a live camera feed, and debug code by scanning a laptop screen. “This is AI that sees the world the way you see it,” said Sundar Pichai.
Performance Benchmarks
On third-party evaluations, Gemini Ultra 2.0 scores above human expert level on MMLU (Massive Multitask Language Understanding), outperforms previous state-of-the-art models on HumanEval coding benchmarks, and achieves a new high score on math competition problems. DeepMind says the model particularly excels at multi-step reasoning tasks that require maintaining context across long chains of thought.
Integration Across Google Products
The model will power Google Search’s AI Overviews, Google Workspace’s Gemini assistant, and Android’s new on-device AI features. Google is also making Ultra 2.0 available through its Vertex AI cloud platform for enterprise customers, with a preview available immediately and general availability expected next quarter.
Developer Access
Developers can access Gemini Ultra 2.0 through the Google AI Studio and the Gemini API. Google introduced a new pricing tier specifically for startups building on Gemini, with generous free-tier quotas intended to compete with OpenAI’s developer ecosystem. A new context caching feature dramatically reduces API costs for applications that repeatedly query the same documents.