Proof

Apple's decade-long bet on local inference finds its moment.

Mar 25, 2026

September 12, 2017. Apple Park's Steve Jobs Theater — its inaugural event. Twelve hundred journalists settle into hand-stitched leather seats. Phil Schiller takes the stage to introduce the A11 Bionic, the chip powering the new iPhone X. Midway through the specs, he mentions a new component: a Neural Engine. It can perform 600 billion operations per second. It powers Face ID and something called Animoji.

Nobody writes it down.

The press writes about the notch. They write about the $999 price tag. They write about Craig Federighi's Face ID demo failing on stage. Ben Bajarin, a Creative Strategies analyst, will later say the Neural Engine was "potentially the most significant announcement of the event." He is nearly alone in this view. Most reviewers treat it as marketing jargon — another spec sandwiched between clock speeds and core counts.

Nine years later, that marketing jargon is in two billion devices.

The News

This week, The Information reported that Apple has "complete access to the Gemini model in its own data center facilities." But Apple isn't just running Google's AI. It's distilling it — training smaller models that can "imitate the internal computations Gemini uses to arrive at its answers." Not just the outputs. The reasoning. Models small enough to run on your iPhone. Without an internet connection.

There's friction. Gemini was tuned for chatbot conversations and code generation — Google's priorities, not Apple's. Apple wants an assistant that scans your photos, books your flights, remembers your conversations. "Apple's Siri objectives don't always align with Gemini's core specialties," the report notes.

Meanwhile, Apple's Foundation Models team continues building independent AI models. Quietly. In parallel.

The $278 Million Living Room

If that architecture sounds familiar — learn from a partner, then build your own — it should.

In April 2008, Apple paid $278 million for a 150-person chip company called PA Semi. The negotiations happened in Steve Jobs' living room. PA Semi made low-power processors for the Pentagon — so sensitive that the Department of Defense considered intervening in the deal. Analysts were baffled. Engadget's headline: "Apple buys P.A. Semi chip designer, Intel says wha?"

What the analysts missed: Apple wasn't buying a product. It was buying the ability to design its own silicon. Two years later, the A4 chip powered the first iPad. By 2012, Apple had designed a fully custom CPU core — no longer licensing anyone's blueprints. By 2020, Apple Silicon had replaced Intel in every Mac. Bill Gates, as Brent Schlender recounts in Becoming Steve Jobs, couldn't believe the earlier PowerPC-to-Intel migration had gone so smoothly. Changing a computer's processor "and not losing a beat," he said, "sounds impossible."

A $278 million check written in a living room became the most important competitive advantage in technology.

The Quiet Engine

The Neural Engine followed the same playbook. Ship it early. Ship it quietly. Wait.

After 2017, Apple kept putting Neural Engines in everything. Every iPhone. Every iPad. Every Mac after 2020. Every Apple Watch. The throughput climbed: from 600 billion operations per second to 5 trillion, then 11, then 35. By 2025, the M5 was roughly 80 times more powerful than the original A11.

Two billion devices. All waiting for models small enough to run on them.

Distillation is the missing piece. Take a massive model — Gemini, trained on Google's TPU farms at a cost of hundreds of millions — and compress its knowledge into something that fits on a phone. The student learns the teacher's reasoning, not just its answers.

Apple doesn't need Gemini to be a perfect Siri. It needs Gemini to be a good enough teacher. The student — running on a Neural Engine Apple has been refining for nine years — doesn't need to match the teacher. It needs to work independently. On your device. Without sending your data anywhere.

The Component Bet

There's a pattern here that goes beyond chips. As Adam Lashinsky documented in Inside Apple, when Apple planned to ditch hard drives for flash memory in iPods and the MacBook Air, Tim Cook's team made "billion-dollar forward purchases of flash memory." The move locked in supply and prices — while starving rivals' access to the same components. Patrick McGee's Apple in China tells a similar story with Samsung: engineers worked on-site at Infinite Loop, Apple learned everything it needed about chip fabrication, and then the relationship fractured into global lawsuits and a push toward full silicon independence.

The component bet as moat. Except now the component isn't a Toshiba drive or a Samsung fab relationship. It's the ability to run AI inference locally, on hardware nobody else has, deployed at a scale nobody else can match.

Apple is already building the next generation. A custom inference chip codenamed Baltra — designed with Broadcom, fabricated by TSMC at 3nm — enters mass production in the second half of this year. It's Apple's first bespoke server chip, purpose-built for the AI inference that Private Cloud Compute handles today on repurposed M-series silicon. The on-device Neural Engine handles what it can. Baltra handles the rest — still on Apple hardware, still under Apple's privacy architecture.

Six Hundred Billion

The "friction" between Siri's goals and Gemini's strengths? That's not a problem. It's a countdown. Apple has never been comfortable depending on someone else's core technology. The Foundation Models team working in parallel isn't a hedge. It's the plan. Gemini is the teacher Apple will eventually stop needing.

On September 12, 2017, Phil Schiller stood in a theater named after the man who'd bought PA Semi in his living room nine years earlier. He said the words "Neural Engine." The audience was thinking about Face ID.

In 2026, Apple is distilling one of the most powerful AI models ever built into something that runs in your pocket. The hardware was ready. It had been ready for years.

Six hundred billion operations per second. That was the proof.

Also This Week

Apple's Next Design Era Takes Shape. Bloomberg profiled John Ternus as Apple's "heir apparent," but the quieter news may matter more: three new executives joined Apple's leadership page this month — Jennifer Newstead (General Counsel, ex-Meta), Molly Anderson (VP Industrial Design), and Steve Lemay (VP Human Interface Design, at Apple since 1999). Apple's entire design leadership has turned over since Jony Ive's departure. The next Apple will be shaped by engineers, not designers.

Siri Gets Its Own App. Mark Gurman reports Apple is building a standalone Siri application for iOS 27 — not the ambient overlay, but something users actively open. Conversation memory. Proactive suggestions. An "Ask Siri" button. If distillation is the engine, this is the dashboard being rebuilt to match it.

The App Store's AI Slop Problem. Forbes reports the App Store is being flooded with AI-generated low-quality apps. Apple has always sold curation as the walled garden's value proposition. The fart-app gold rush of 2009 was human-speed chaos. This time the volume is machine-generated — and the review team wasn't built for it.

From the Library

This newsletter draws on 29 books about Apple's history. Today's issue featured stories from:

Inside Apple by Adam Lashinsky — The flash-memory supply-chain ambush that starved rivals

Apple in China by Patrick McGee — Samsung's on-site chip engineers and the pivot to Apple Silicon

Becoming Steve Jobs by Brent Schlender — Bill Gates's disbelief at the "impossible" processor switch

Apple's Story

Discussion about this post

Ready for more?