Google Just Split Its AI Chip in Two. TPU 8t Trains Models in Weeks. TPU 8i Runs Millions of Agents at Once. Here Is What Changed.

Google Cloud held its annual Next conference in Las Vegas on April 22, 2026, and the headline that most technology reporters led with was accurate but incomplete: Google launched two new AI chips to compete with Nvidia. The fuller story is more interesting than the headline because it explains why two chips, not one, and what the split architecture signals about where AI compute is actually going.
Google Cloud announced that its eighth generation of custom‑built AI chips, or tensor processing units (TPUs), will be split in two. One chip, named the TPU 8t, will be geared for model training, and another, the TPU 8i, is aimed at inference ‑ the ongoing usage of models, aka what happens after users submit prompts.
The performance specifications for each chip are specific enough to be meaningful:
TPU 8t (training):
- Packs 9,600 chips in a single superpod to provide 121 exaflops of compute and two petabytes of shared memory connected through high‑speed inter‑chip interconnects.
- Delivers nearly 3x higher compute performance than the previous generation, compressing frontier model training from months to weeks.
- Delivers 124% more performance per watt than the preceding generation.
- Can connect one million‑plus TPU chips in a single cluster.
TPU 8i (inference and agentic AI):
- Triples on‑chip SRAM to 384 MB and increases high‑bandwidth memory to 288 GB, hosting massive KV Caches entirely on silicon.
- Doubles ICI bandwidth to 19.2 Tb/s and introduces a dedicated Collectives Acceleration Engine that reduces on‑chip latency by up to 5x to minimize lag during high‑concurrency requests.
- Delivers a performance gain of 117% per watt compared to the preceding generation.
- Connects 1,152 TPUs in a single pod.
Both chips are available to Google Cloud customers later this year alongside Google's continued portfolio of Nvidia GPU instances.
Why the Split Matters
A single chip designed to do everything well is a harder engineering problem than two chips each optimized for different requirements. Training and inference make different demands on hardware in ways that compound at scale. Training requires enormous compute throughput, large shared memory pools, and high‑bandwidth interconnects between chips in a cluster. Inference requires low latency, fast response to individual requests, and the ability to handle thousands of concurrent users simultaneously without each waiting for the others.
Google's vice president of compute and AI infrastructure Mark Lohmeyer described the design philosophy: "It's about how you deliver the lowest possible latency of the response at the lowest possible cost per transaction. The number of transactions is going way up, and the cost per transaction needs to go way down for it to scale."
The AI agent use case is the specific context for the TPU 8i design. An AI agent handling a task does not run a single inference operation. It runs dozens, each responding to the outputs of previous steps, each requiring the kind of low latency that makes the interaction feel responsive rather than slow. TPU 8i connects 1,152 TPUs in a single pod, dramatically reducing latency with 3x more on‑chip SRAM, to deliver the massive throughput and low latency needed to concurrently run millions of agents cost‑effectively.
The $750 Million Partner Fund
Alongside the chip announcements, Google also announced a $750 million fund to help boost corporate AI adoption and showed off tools for building AI agents.
The initiative is designed to strengthen Google's partner ecosystem by offering financial support, technical resources, and dedicated engineering expertise. The funding will be available to global consulting firms, systems integrators, software vendors, and channel partners working to deploy AI solutions at scale.
Google will embed forward‑deployed engineers alongside firms such as Accenture, Capgemini, Cognizant, Deloitte, Devoteam, HCLTech, and TCS. Select consulting partners including Accenture, Bain & Company, BCG, Deloitte, and McKinsey will also receive early access to Gemini models.
Enterprise‑ready agents built on the Gemini Enterprise Agent Platform from vendors including Adobe, Atlassian, Oracle, Palo Alto Networks, Salesforce, ServiceNow, and Workday form the ecosystem foundation.
What Sundar Pichai Said About Google's Own AI Usage
Sundar Pichai disclosed that 75% of all new code at Google is now AI‑generated and approved by engineers, up from 50% last fall. He also described a code migration completed by agents and engineers working together that was finished six times faster than was possible with engineers alone. Google's Security Operations Center agents automatically triage tens of thousands of unstructured threat reports monthly, reducing threat mitigation time by more than 90%.
The competitor context that makes all of this commercially significant: Google has already announced an expanded deal with Anthropic to provide multiple gigawatts of next‑generation TPU capacity to the AI lab. Google is also working to provide Anthropic rival OpenAI with TPU capacity. And in February, Meta signed its own multiyear, multibillion‑dollar deal for access to Google's TPUs.
Google also says it has agreed to work with Nvidia to engineer computer networking that allows Nvidia‑based systems to perform even more efficiently in its cloud. That collaboration, rather than pure competition with Nvidia, reflects the reality that most enterprise customers want both and will continue to need both for the foreseeable future.
More at cloud.google.com/next





