🔫 Return Fire: Industry Teams-up Against Nvidia

And can mushrooms make data center construction more sustainable

Here’s what you should know today:

  • COOLING, TECH, AND POWER: New consortium looks to take on Nvidia’s NVLink, DC Operators ask for hotter chips

  • CHECK THIS OUT: 🎧 How Mycocycle uses mushrooms to make data center construction sustainable

  • BIG DEALS: Chinese dc operators are skirting CHIPS sanctions, Prologis and CyrusOne file for new facilities, and Microsoft, OpenAI and Oracle expand their partnership.

- Cooling, tech, and power -

Return Fire: Industry Teams-up Against Nvidia

Nvidia's dominance in the chip interconnect space is being challenged by an impressive coalition of tech giants. AMD, Broadcom, Cisco, Google, HPE, Intel, and Microsoft have teamed up to form the Ultra Accelerator Link (UALink) Promoter Group.

Their goal is to create a high-speed interconnect technology to rival Nvidia's NVLink, which currently allows Nvidia processors to share data at extremely high speeds.

The UALink initiative aims to establish an open industry standard for AI accelerator communication, facilitating easier integration, greater flexibility, and scalability in AI-connected data centers.
The move is critical in an infrastructure environment that has come to be driven by AI and high-performance computing.

Nvidia has a head start thanks to its $6.9 billion acquisition of Mellanox in 2019, which helped it develop NVLink. But now, with companies like Microsoft, Meta, and Google (who are also building custom processors for their cloud services) on board, UALink plans to catch up quickly.

The first major step for UALink is to develop a specification for a high-speed, low-latency interconnect. This 1.0 specification will support the connection of up to 1,024 accelerators within an AI computing pod, enabling efficient data transfer between GPUs and other accelerators in the pod.

UALink Scale-up Pod connecting GPUs from multiple servers combined into one computational domain. Source: UALink Consortium

Forrest Norrod, AMD's EVP and GM of the data center solutions group, says “For the most advanced models, many accelerators need to work together in concert for either inference or training. And being able to scale those accelerators is going to be critically important for driving the efficiency and the performance and the economics of large-scale systems going out into the future.”

UALink members are also part of the Ultra Ethernet Consortium (UEC), which focuses on improving Ethernet networks to meet AI's high-performance needs. Formed last year by companies like AMD, Intel, and Microsoft, the UEC includes over 50 vendors and plans to release specifications for scalable Ethernet improvements later this year.

J Metz, chair of the UEC, sees great potential in collaborating with UALink, saying, “At UEC, we believe that UALink’s scale-up approach to solving pod cluster issues complements our own scale-out protocol, and we are looking forward to collaborating together on creating an open, ecosystem-friendly, industry-wide solution that addresses both kinds of needs in the future.”

_____________________________________

Data Center Operators Ask: “Hotter Chips, Please”

Nvidia recently launched its first liquid-cooled compute system and is already hinting at next-gen chips that will surpass its 1,000-watt Blackwell GPUs.

My Truong, field CTO at Equinix, pointed out that silicon designers are clinging to outdated ideas about ideal data center temperatures. They're demanding increasingly lower water temperatures to manage the thermal challenges of high-power chips. 
Truong noted that this trend is problematic, as lower temperatures require more power and effort from cooling infrastructures.

Equinix is also trying to recycle waste heat by supplying it as hot water to local communities. However, the demand for lower cooling temperatures complicates this effort, as it reduces the heat available for recycling.

Truong suggested that if Equinix could use 122° F water for cooling, it would significantly reduce the need for chiller plants, and the exit water would be hot enough for recycling. He emphasized the need for ongoing conversations with the industry to find a balance.

Nvidia's Dave Salvator acknowledged the challenge of keeping up with the cooling demands of their chips and mentioned that they're exploring various solutions, including immersion cooling.

“We will look at and/or all solutions that we think make sense to be able to deliver more AI capabilities into data centers,” Salvator stated.”

Truong hopes that as the market evolves and alternatives from companies like AMD and Intel emerge, the entire ecosystem will move towards more efficient cooling solutions.
He believes that once a viable alternative catches on, it will steer the industry in the right direction.

_____________________________________

More in Cooling, Tech, and Power

1. Low-cost, Grid-scale, Water-based Battery Developed in UK. Scientists from the University of Southampton say they have developed a low-cost, grid-scale battery with a water-based electrolyte which could help incorporate more renewable power into the energy grid.

🎧 Listen to This

In the latest Zero Downtime podcast, Joanne Rodriguez of Mycocycle talks about how the company uses mushrooms to recycle data center construction waste. Fascinating technology!

The link directs to the episode’s homepage.

- Deals and Developments -

Chinese Data Centers Skirt US CHIPS Sanctions

Chinese companies, unable to get their hands on advanced AI chips due to US sanctions, have found a clever workaround: buying or renting access to them on US soil.

According to The Information, ByteDance, the Chinese owner of TikTok, has been renting Nvidia’s top-notch chips from Oracle for AI computing. Sneaky, right?
China Telecom is also trying to secure a similar deal with other cloud providers. Meanwhile, Alibaba and Tencent are reportedly chatting with Nvidia about setting up data centers in the US to tap into these prized chips.

In a plot twist, two smaller US cloud companies turned down the chance to rent Nvidia H100 chips because it "seemed to go against the spirit" of US sanctions.

Nvidia’s spokesperson said they support “new AI data centers in the US,” but dodged questions about deals with Chinese firms or ByteDance’s chip access through Oracle

Despite the sanctions, Chinese institutions have still snagged high-end Nvidia chips through resellers and by repurposing Nvidia gaming chips for use in. They've even started repurposing Nvidia gaming chips for AI hardware.

Adding to the drama, the Department of Justice is launching an antitrust investigation into Nvidia’s alleged chip order favoritism and its cozy relationships with certain businesses, particularly those that are not developing competing chips of their own.

_____________________________________

More Big Deals

  • CyrusOne files for 9-Building campus outside Chicago: The project will encompass 228 acres in Yorkville, IL, and will be developed over 10-20 years, according to the filing this week by CyrusOne. The land was acquired last year by project partner, Yorkville Nexus LLC, who is attempting to rezone and develop 2 100+-acre parcels in the area for the same purpose.
    Chicago is one of the most important data center markets in the US. While data center inventory has grown over 35% in 5 years, availability remains at 1.6%.
    CyrusOne joins Microsoft whos in pre-construction on a 500 acre campus down the street in Plano, IL.

  • Oracle, Microsoft and OpenAI expand partnership, bring Azure AI platform to Oracle Cloud Infrastructure. According to DataCenterDynamics, “The OCI Supercluster, for training AI models, can scale 64k Nvidia Blackwell GPUs or GB200 Grace Blackwell Superchips connected by ultra-low-latency RDMA cluster networking and a choice of HPC storage.”

    Sam Altman of OpenAi says the move will “extend Azure’s platform and enable OpenAI to continue to scale.”
    Oracle has also revealed that it is developing “a very large data center” for OpenAI.

  •  Prologis wants to redevelop warehouses for data centers in Data Center Alley The developer has filed for exceptions for extra density allocations at their 10-acre, Sterling, VA site. The site currently houses two commercial industrial buildings. This is Prologis’ second application of this type in the area.

Thank you for reading.

You can support this newsletter by telling me how we did or sharing it with a friend or colleague.

- Taylor