Meta sets a new tone for data center development đź–Ą

Plus Cisco making bold claims, google's got a new supercomp and AMD has even more chips

Good morning, this is Cold Isle Insights.
The five-minute morning newsletter that all your smartest, best-looking coworkers are reading.

Here’s what we’re looking at:

🔑 Key Points:

Featured: Meta unveils updated dc plans that have disrupted their development trajectory 🍿

Power + Cooling: Cisco makes bold claims about servers ability to reduce power costs 🔋

Under the Hood: Google unveils new a3 super computer and even more chips coming from AMD đźš—

Daily Dall-E 🎨:

Est./ read time: 5mins, 1secs

- Featured -

Meta pulls back the curtain on new data center strategy

Late last year DCD broke the story that data behemoth Meta was pausing development of all its current and planned data centers pending a redesign centered around AI.

On Friday, those plans were largely unveiled. Here’s a rundown:

Gigantic scale

Currently, Meta operates 21 data centers around the world (including the world’s largest) which combined represent an investment north of $16 billion.

But it’s not enough. According to engineering director, Alan Duong, they expect to double their current data center footprint over the next 5 years.

And it’s not just the physical space. The team at Meta also expects the computing power of each chip to increase significantly with the custom AI chips they’ve announced.

New chips

With AI as the new driver of its global platform, Meta revealed Friday that its future data center strategy will hinge on the two proprietary chips they are developing.

New MTIA chip from Meta. Datacenterfrontier.com 

  1. Their AI training chip, MTIA, will handle specific AI workloads and has been designed and optimized to Meta’s particular needs. The semiconductor will be fabricated by TSMC.

  2. The MSVP is a video processing chip that will be faster, (9x faster than traditional encoders) and more energy-efficient than what’s currently available. Uniquely, the MSVP has the ability to compute a quality score with every video it encodes.

Keeping it all cool

Meta, foreseeing that its future AI chips will use 5x the power of current CPUs, is planning a two-phase shift to liquid cooling.

The first phase, Air-Assisted Liquid Cooling (AALC), offers direct-to-chip liquid cooling and will be deployed by early 2025 in existing data centers.
The next-generation data center, expected in late 2025, will incorporate AALC and custom liquid cooling for AI model training, supported by an adaptable cooling infrastructure based on evolving hardware requirements. This transition to liquid cooling will be gradual.

Here’s an example of the AALC technology:

These changes mark a landmark pivot for Meta’s data center future. They also send a strong signal to the market that, not only will the pace of development continue rapidly, it will do so in such an exponential way that one of its biggest players is completely rethinking the future of its digital infrastructure.

For a more in-depth rundown, read DCFrontier’s piece on the announcement HERE.

- Under the Hood -

Another revelation: Google’s A3 supercomputer

Google Cloud has announced the launch of the A3 GPU supercomputer, as part of its strategy to offer purpose-built infrastructure for demanding AI and machine learning models.
The A3 supercomputers are designed specifically to train and support complex AI models for generative AI and large language models. These machines integrate NVIDIA H100 Tensor Core GPUs and Google's advanced networking capabilities, catering to customers of various sizes.

NVIDIA H1000. nvidia.com 

If you’re interested, Google describes their new supercomputer in depth HERE.

Even more chips coming from AMD

AMD is broadening its data center chip range with the introduction of the Instinct MI300, an APU boasting a staggering 146 billion transistors.

AMD Instinct MI300. Notebookcheck.com

Built from 13 compact computing modules, the MI300 features a 3D configuration that allows them to be customized to create unique processors. The chip’s first layer consists of 9 CPU and GPU chiplets based on AMD’s Zen 4 processor and CDNA 3 architecture which drives their data center graphics cards.
Above this first layer is a second semiconductor layer housing four more chiplets responsible for more auxiliary tasks.

Find out more about how these are made and what they do HERE.

- Power and Cooling -

Cisco makes big claims about UCS X servers

Cisco's Intersight infrastructure management platform and Unified Computing System (UCS) X-Series servers could potentially decrease data center energy usage by up to 52%, says Jeremy Foster, SVP of Cisco’s compute business.

Now of course you have to be updating from equipment two or three generations old, but Foster told SDX Central that their new servers can increase server workload efficiency by 66%, reduce maintenance cost by 39% and decrease software required by 27%.

Beyond physical infrastructure, upgrading from older processors to newer ones (like transitioning from a Windows 10 to Windows 11 environment) and adding GPUs can lead to more drastic consolidation. For instance, the same server on UCS X can now deliver 8x the number of desktops compared to a few years ago.

Cisco is making a big push towards efficiency with these modular UCS-X servers, and making big claims about their ability.

Daily Dall-E

One of the earliest data center techs known to history:

Thanks a lot for reading. Please let me know how we’re doing by replying directly to this email. And if you want to help grow this, please share with a coworker!

- Taylor