A 2-million token context window — large enough to ingest eight full-length novels simultaneously — shipped as a standard feature of Google's Gemini 3.1 Ultra when the model launched in April 2026, doubling the previous production benchmark and setting a new operational standard for enterprise AI deployments. No competing model at general availability matches it.
Context windows are the working memory of a language model during a session: every additional token allows the system to hold more information active at once. Gemini 2.0 Ultra, released in late 2025, carried a 1-million token context. OpenAI's GPT-5, which launched in March 2026, offers 256,000 tokens as its standard consumer tier. The doubling to 2 million is not primarily a consumer feature — most individuals will never write a 2-million token prompt — but for enterprise use cases including legal discovery, medical record synthesis, software code audits, and long-form financial analysis, the expanded capacity is practically significant and changes what tasks can be completed in a single session without truncation.
Gemini 3.1 Ultra processes text, image, audio, and video natively across all modalities simultaneously — a design choice that differs from prior versions, which handled multimodal inputs as separate processing streams. Google cited internal benchmarks showing the model scores 89.3 on the MMLU Pro test for professional knowledge reasoning, compared to GPT-5's reported 87.1 and Anthropic's Claude 4.6 at 88.5; all three figures come from the respective companies' own evaluations released between March and April 2026 and should be interpreted accordingly.
“For enterprises that cannot send data to a third-party API for compliance or latency reasons, Gemma 4 represents the strongest self-hosted option currently available.”
Google's open-weights companion model, Gemma 4, ranked first on the LMSYS Chatbot Arena among open-source models as of April 10, 2026, with an Elo rating of 1,412 — 28 points ahead of Meta's Llama 4 Scout at 1,384, per the publicly maintained LMSYS leaderboard. For enterprises that cannot send data to a third-party API for compliance or latency reasons, Gemma 4 represents the strongest self-hosted option currently available.
Key Takeaways
- Google Gemini 3 Ultra: Gemini 3.
- Gemma 4: Gemini 3.
- agentic AI: Gemini 3.
- AI 2026: Gemini 3.
The underlying commercial driver is agentic AI — systems that don't merely answer questions but autonomously plan and execute multi-step tasks across software environments without human confirmation at each step. Bloomberg Intelligence projected in March 2026 that global enterprise AI software spending will reach $297 billion in 2026, up 41 percent from 2025. Google Cloud's AI revenue grew 52 percent year-over-year in Q4 2025, reaching $12.3 billion for the quarter, per Alphabet's February earnings — still behind Microsoft Azure's AI services division, which posted $18.7 billion for the same period. Gemini 3.1 Ultra and Gemma 4 are explicitly designed to close that gap in the agentic tier, which both Google and Microsoft have identified as the category that will define enterprise AI adoption in 2026 and 2027.
Google's DeepMind team described the target capability in an April 2026 paper published in Nature as "proactive autonomy at the workflow level" — distinguishing true agentic systems from chatbots by their ability to operate browsers, execute code, retrieve live data, and delegate to subordinate agents without interrupting the user for approval at each step. Gemini 3.1 Ultra's tool-use APIs were built with this architecture in mind. The 2-million token context window makes it possible to maintain coherent state across the kind of long-horizon task chains — multi-day research projects, iterative code refactors, cross-document legal reviews — that previous models had to chunk into separate sessions with inherent context loss.
Advertisement
For OpenAI and Anthropic, the launch resets competitive benchmarks. GPT-5, released in March, had briefly given OpenAI the leading position on most enterprise evaluations. The context window gap and multimodal architecture shift the comparison points. The realistic near-term consequence is not mass customer migration — enterprise AI contracts carry real switching costs — but that Google Cloud now functions as a credible first-consideration option for new deployments rather than a secondary evaluation after the other two. That change in the buying process matters at scale.
The risk buried in the agentic narrative is governance. Models that autonomously execute workflows across live software environments introduce failure modes that advisory chatbots do not. A legal discovery agent that misclassifies a privileged document, or a financial system that executes a flawed transaction because its context window misread contract terms across a 200-page PDF, creates liability chains that current regulatory frameworks do not clearly assign. The European Union's AI Act, fully in force since August 2025, classifies certain agentic deployments as high-risk systems requiring conformity assessments — but enforcement mechanisms remain nascent, and auditing an autonomous multi-step workflow is substantially harder than reviewing a single AI output. Enterprise legal teams are beginning to flag this gap.
The next threshold to watch is Google I/O, scheduled for May 20, 2026, where the company is expected to confirm whether Gemini 3.1 Ultra will be integrated into consumer-facing products including Search, Workspace, and the Gemini app. That rollout, if announced, would put the 2-million token context window in front of hundreds of millions of users and set the practical definition of what mainstream AI looks like heading into 2027.