MWC Barcelona 2026: A Deep Dive into Huawei’s AI Infrastructure Stack

By Jonas Nordin, Aixia AB
March 2026

Introduction

In early March, I spent four days at MWC Barcelona as a guest of Huawei Enterprise, alongside colleagues from Arrow ECS. The trip was packed with sessions, speeches, roundtables, and – most valuably – one-on-one deep dives with Huawei engineers where we could get behind the slides and into the actual architecture, measured performance numbers, and engineering trade-offs.

This post is my attempt to document what I learned about Huawei’s AI infrastructure stack. Not as a product review, but as a technical walkthrough of a full-stack approach that I think deserves more attention than it currently gets in the Nordics and Europe.

The Compute Layer: Atlas 950 SuperPoD & Ascend NPUs

The Ascend NPU Lineup

At the heart of Huawei’s AI compute strategy are the Ascend NPUs (Neural Processing Units). The current lineup includes the Ascend 910C, which powers the Atlas 900 A3 SuperPoD, and the upcoming Ascend 950DT. Looking further ahead, Huawei has publicly committed to a three-series roadmap: the Ascend 950 series (including the 950PR for inference and 950DT for training), the Ascend 960 series, and the Ascend 970 series.

Unlike NVIDIA’s GPU architecture, which evolved from graphics workloads into AI, Huawei’s Ascend chips were designed from the ground up as neural network accelerators. The Da Vinci architecture at the core of each NPU is built around a matrix computation unit optimized for tensor operations, and doesn’t carry the legacy of a graphics pipeline.

Atlas 950 SuperPoD

The Atlas 950 SuperPoD was shown outside China for the first time at MWC Barcelona 2026. It connects up to 8,192 Ascend 950DT NPUs via Huawei’s proprietary UnifiedBus interconnect protocol. In its full configuration, the system spans 160 cabinets (128 compute, 32 communications) deployed across roughly 1,000 square meters, all linked by all-optical interconnect.

The key concept here is that a SuperPoD operates as a single logical machine. Rather than stacking independent servers and relying on network fabric to connect them, UnifiedBus provides unified memory addressing and high-bandwidth, low-latency interconnection between all NPUs. This means the system can learn, reason, and process as one unit, which fundamentally changes the scaling characteristics compared to traditional horizontal scale-out.

UnifiedBus 2.0: An Open Ecosystem Play

One of the more surprising moves from Huawei is the decision to open-source the UnifiedBus 2.0 technical specifications. Where NVIDIA’s NVLink remains proprietary and tightly coupled to NVIDIA hardware, Huawei is inviting industry partners to adopt UnifiedBus and develop compatible products and components. The goal is to build an open ecosystem around the interconnect protocol.

This is a strategic bet. By opening the protocol, Huawei trades short-term hardware lock-in for ecosystem growth. Their stated monetization strategy is focused on hardware (chip sales), not software licensing.

UnifiedBus Under the Hood

UnifiedBus is not just another networking protocol. It is fundamentally a memory fabric. The protocol integrates directly into the processor via a dedicated Unified Bus Memory Management Unit (UBMMU), which allows remote memory to be mapped into local address space. When a processor executes a load instruction against a remote address, the UBMMU translates it into a UB memory operation and sends it over the optical interconnect to the remote node. The remote side validates the access and returns the data. This happens transparently to the application layer.

This is closer to how a traditional shared-memory multiprocessor works than to how a typical networked cluster operates. The practical implication is that the 8,192 NPUs in a SuperPoD can share a unified memory pool, which is essential when training models with hundreds of trillions or trillions of parameters.

At the interconnect level, the Atlas 950 SuperPoD achieves 16 PB/s of total interconnect bandwidth. The optical interconnect supports connections over 200 meters within a data center and is claimed to be 100 times more reliable than conventional approaches. Huawei claims 95% compute efficiency on an 8,192-NPU SuperPoD.

How This Compares to NVIDIA

NVIDIA’s approach with NVLink and the NVL72/NVL144 platforms focuses on maximizing per-chip performance and connecting relatively smaller numbers of very powerful GPUs. The upcoming NVL144 system connects 144 Blackwell Ultra GPUs.

Huawei’s approach is fundamentally different: compensate for any per-chip performance gap by connecting vastly more NPUs into a single logical system. The Atlas 950 SuperPoD has 56.8 times more NPUs than NVIDIA’s NVL144 has GPUs. Whether this brute-force scaling approach delivers better real-world performance depends heavily on the workload, but the architectural ambition is clear.

The Software Stack: CANN, MindSpore & PyTorch

Hardware is only half the story. For anyone coming from the NVIDIA ecosystem, the software stack is where the real questions lie.

CANN: The Foundation

CANN (Compute Architecture for Neural Networks) is Huawei’s equivalent to CUDA. The current version, CANN 8.0, represents a significant step forward, though the developer experience is not yet on par with CUDA. CUDA has had over 15 years of ecosystem development; CANN is closing the gap, but from behind.

PyTorch Support via torch_npu

Huawei’s most important software effort is torch_npu – a PyTorch backend plugin that enables PyTorch models to run on Ascend NPUs through PyTorch’s PrivateUse1 mechanism. This means you can take existing PyTorch code and run it on Ascend hardware with relatively minimal changes.

The CUDA Comparison

Let’s be direct: CUDA is still ahead. The ecosystem maturity, breadth of optimized libraries, and seamless integration with tools like PyTorch, TensorFlow, and JAX give NVIDIA a moat that cannot be crossed overnight. But the gap is narrowing.

The Network Layer: Xinghe AI Fabric 2.0

Xinghe AI Fabric 2.0 is Huawei’s answer to data center networking in the AI era. It is built on a three-layer architecture: AI Brain (intelligent network management), AI Connectivity (the fabric itself), and AI Network Elements (switches and interfaces).

NVIDIA’s networking strategy centers around InfiniBand, while Huawei’s approach is Ethernet-based. Ethernet is more widely deployed and better understood by enterprise teams; InfiniBand offers lower latency at the extremes.

The Storage Layer: OceanStor A Series & Dorado

Storage is where Huawei surprised me most. Huawei has clearly invested heavily in making storage a first-class citizen in the AI infrastructure stack.

OceanStor A800 is the high-end AI storage system, designed for large-scale training and inference. Each 8U controller supports 64 NVMe SSDs with raw capacity from 245 TB to 983 TB. Huawei claims 500 GB/s bandwidth and 24 million IOPS per enclosure.

OceanStor A600 supports AI data paradigms: vectors, tensors, and KV cache. Huawei claims 78% reduction in Time To First Token (TTFT) and 60% improvement in inference throughput.

In MLPerf Storage v2.0 benchmarks, the OceanStor A series ranked first worldwide. An 8U A800 system sustained 698 GiB/s bandwidth, meeting requirements for 255 H100 GPUs.

UCM & Multi-Level KV Cache

UCM (Unified Cache Manager) intelligently manages KV cache data across tiers:

  • L1: GPU HBM – Hot data
  • L2: Host DRAM – Warm data
  • L3: SSD/NVMe – Cold data

Huawei claims this reduces TTFT by up to 90% and improves inference throughput by over 10x in long-sequence scenarios.

Reflections & Takeaways

The Full-Stack Advantage: Huawei is one of very few companies that can deliver AI infrastructure from chip to network to storage to platform. The vertical integration enables optimizations difficult to achieve with multi-vendor setups.

A Real Alternative: For years, AI infrastructure conversations have been almost exclusively about NVIDIA. Huawei’s portfolio demonstrates this assumption is increasingly outdated. The hardware is competitive, the software stack is maturing.

What This Means for Aixia: Understanding the full range of available technologies is essential for advising our customers effectively. This trip gave us much deeper understanding of where Huawei fits in the broader AI infrastructure ecosystem.


A big thank you to Huawei Enterprise for a very well organized week, and to Arrow ECS for the continued partnership.

Have questions or want to discuss AI infrastructure options? Contact us at Aixia.

Latest News

MWC Barcelona 2026: A Deep Dive into Huawei’s AI Infrastructure Stack

A technical walkthrough of Huawei’s full-stack AI infrastructure from MWC Barcelona 2026, covering Atlas 950 SuperPoD, Ascend NPUs, UnifiedBus, CANN,…
Read more

What does your AI really cost? A guide to TCO for AI infrastructure

AI infrastructure TCO: The cloud looks cheap until you scale. Here are the hidden costs – and three things that…
Read more

Agentic AI: An exciting future or an architectural headache?

Agentic AI is more than a buzzword. Agents plan and act independently, but this places entirely new demands on cost…
Read more

The five mistakes we see over and over again when organizations run AI in the cloud

Five mistakes we see over and over again when organizations run AI in the cloud – from TCO calculations that…
Read more