By Stephen Nellis ALBUQUERQUE, New Mexico, May 18 (Reuters) – In a nondescript building on Kirtland Air Force Base on the high desert of New Mexico, liquid-cooled supercomputers gurgle and hum their way through some of the most complex math problems the U.S. government seeks to solve: simulating how hypersonic nuclear weapons would move through […]
Science
As chip industry chases AI, U.S. national labs look to newcomers for supercomputers
Audio By Carbonatix
By Stephen Nellis
ALBUQUERQUE, New Mexico, May 18 (Reuters) – In a nondescript building on Kirtland Air Force Base on the high desert of New Mexico, liquid-cooled supercomputers gurgle and hum their way through some of the most complex math problems the U.S. government seeks to solve: simulating how hypersonic nuclear weapons would move through the earth’s atmosphere, or what would happen if one nuclear warhead detonated near another.
For more than a decade, the chips handling this secretive and demanding work came from mainstream semiconductor firms like Nvidia or Advanced Micro Devices.
But with those companies increasingly designing their chips for artificial intelligence and facing supply shortages, the managers in charge of the systems at Sandia National Laboratories, which operates the machines at Kirtland and is one of three U.S. labs tasked with developing and maintaining the nation’s nuclear weapons arsenal, are increasingly unsure how they will find computing power for high-precision scientific work like theirs.
“The pressure we’re feeling right now is on the computing front and also from the supply chain,” said Steve Monk, the manager of Sandia’s high-performance computing team, explaining the challenge of getting chips that meet his needs. “Looking to the future, it’s a bit stressful in terms of our ability to deliver to the mission.”
NEW ENTRANTS INTO CHIP MARKET
The lab’s predicament shows how the race for better AI chips is having the unintended consequence of opening markets once dominated by the big firms to smaller players such as NextSilicon, an Israeli startup whose chips are being tested by a program at Sandia. It also shows the role that Sandia, which worked with Nvidia extensively as the company rose to prominence in supercomputing and is still collaborating with Nvidia on new memory technology, plays in incubating and shaping new computing technologies.
One major concern for officials at Sandia is what is known as double-precision floating point computation, a technical term for being able to compute both very large and very small numbers without losing accuracy to rounding errors. For years, Nvidia and AMD pursued leadership in speeding up that kind of computing, landing supercomputing contracts with universities and government labs.
But AI work does not benefit from double-precision computing in the same way as simulating physics problems. While AMD is releasing a version of its chips aimed at scientific computing, the double-precision performance of Nvidia’s forthcoming Rubin chips has declined by some measures, worrying many scientists in the high-performance computing industry, said Ian Cutress, chief analyst at More Than Moore, a chip consulting firm.
Daniel Ernst, senior director of supercomputing products at Nvidia, said the company remains committed to scientific computing, aiming to create a balanced chip that can run real-world scientific applications alongside AI work.
But the shifting chip market has prompted officials at Sandia to test products from newcomers such as NextSilicon, whose chip uses a completely different computing approach than graphics processing units (GPUs) or central processing units (CPUs) from Nvidia and AMD.
NUCLEAR SECURITY WORK
On Monday, Sandia, NextSilicon and Penguin Solutions, the firm that helped weave NextSilicon’s chips into a supercomputer, said the systems have passed a key technical milestone using a battery of general supercomputing tests that put the chips in the running for use in government systems.
That sets up NextSilicon’s chips for a decision this fall on whether to start testing the chips with more demanding computing problems that closely resemble the kind of nuclear security work they would eventually have to handle.
The NextSilicon chips can perform double-precision computing and are also designed to reprogram themselves on the fly to run more efficiently. NextSilicon’s chips save electricity by using what is known as a data flow architecture that spends less time and energy shuffling data back and forth to the computing system’s memory.
Sandia’s work with chip firms often helps technology become widespread. Liquid cooling systems for chips were an exotic idea when Sandia started urging Intel, AMD and Nvidia to work on the technology more than a decade ago, and now they are common.
James Laros, a senior scientist at Sandia who oversees a program to test new computing architectures at Sandia, said the work with smaller players like NextSilicon is aimed at ensuring Sandia can always procure the chips it needs, even if major chip firms shift focus.
“We have to keep available options to complete our mission, because the mission is not optional,” Laros said.
(Reporting by Stephen Nellis in Albuquerque, New Mexico; editing by Peter Henderson and Nick Zieminski)

