Jonathan Ross, Nvidia’s chief software architect and founder of Groq, joined John Yetimoglu, CIO of Infinitum and Groq board director, for a 4:20 PM session at the May 12, 2026 Sohn New York Conference titled “The Inference Revolution: Groq, Nvidia and the Future of AI.” The conversation, introduced by KPMG partner Seth Blackman, veered sharply from architectural theory to an equity pitch on datacenter infrastructure suppliers. Yetimoglu spent most of the slot arguing that the hardware layer supplying AI clusters is repricing faster than the market has noticed, citing revenue growth above 50 percent, gross margins 300 to 500 basis points higher than historical norms, and multi-year order visibility. Ross framed the backdrop, compute demand scaling without end, bottlenecks shifting continuously, and models becoming sentient not in the binary sense but as a civilization-wide feedback loop, before ceding the floor to Yetimoglu’s numbers.
The pitch rested on a structural mismatch: efficiency gains in chips might double or triple year-over-year, Yetimoglu said, but total compute demand is growing twelve times as fast. “The chips basically get better 100 percent, maybe 200 percent,” he said. “Your total compute demand is 12X.” That gap forces every layer of the stack to rebuild for workloads the prior generation of x86 servers and cooling systems never anticipated. Network speeds illustrated the pace: “It used to be 1.8, and then seven years later you might be reading 10. You’re at 400k, next to 800k, next year is 1.6 terabit, the next year is 3.2 terabit.” The suppliers shipping at those speeds, he argued, are seeing pricing power that lasts through at least 2027.

