Scalable, Low-Latency Networks for AI and HPC

Reduce Job Completion Time for AI Workloads

The AI revolution is underway

The rapid growth of AI and machine learning demands HPC infrastructures that handle complex workloads with low latency and high throughput. UfiSpace addresses these challenges with 800G Broadcom-powered Distributed Disaggregated Chassis switches and Ethernet platforms, delivering high capacity, connection radix, and network efficiency. These scalable solutions optimize network flow, enabling efficient expansion and low-latency performance for AI data centers at reduced costs.

     

Unleashing the Potential of Networks in AI Advancements

 

Networking Architecture

Ethernet architectures efficiently manage massive AI data and inter-node communication with 400GbE or 800GbE switches. Their simplicity, scalability, and affordability make them a key alternative to proprietary technologies like InfiniBand, enabling seamless growth for AI data centers.

 
image of ethernet architecture

Ethernet Fabric

The leaf-spine structure of Ethernet fabric supports ultra-high-speed data transfer with minimal latency, ideal for AI model training. Its scalability allows data centers to add nodes or processing units without compromising performance.

 

DDC (Distributed Disaggregated Chassis)

Designed for telecom, DDC systems in AI data centers help optimize and scale network infrastructure. This architecture enables horizontal scaling and dynamic resource allocation for cost-efficient, compute-intensive AI workloads.

 

Ultra Ethernet Trend

Ultra Ethernet offers high throughput, congestion control, and lossless data transfer, supporting data-intensive AI operations. With high bandwidths, it ensures smooth performance without bottlenecks for growing AI demands.

     

The Networking Backbone for Accelerated Computing: Ethernet Switches

  • Performance Enhancements: Ethernet switches with RDMA over Converged Ethernet reduce CPU overhead and packet drops, processing trillions of packets per second on 400GbE interfaces.
  • Low-Latency Data Transfers: Ethernet switches achieve microsecond latency with cut-through switching and RDMA, ensuring efficient GPU-to-GPU communication during training.
  • High-Bandwidth Capacity for AI and HPC: With up to 51.2 Tbps bandwidth, Ethernet switches support dense GPU clusters and large datasets without bottlenecks.
  • Scalability and Flexibility for Evolving Infrastructure: Leaf-spine topologies and widely supported protocols allow Ethernet switches to grow effortlessly for AI and HPC clusters while retaining great performance and cost effectiveness.
  • Cost-Effectiveness Compared to Other Solutions: Ethernet switches provide a cost-effective alternative to InfiniBand, offering high performance with open standards and lower costs.

     

Optimizing AI Cluster Interconnect with Dual Platform Selection

UfiSpace's dual-platform strategy combines two fixed-form-factor solutions, the S9300 series and the S9700 series, to enable efficient, scalable AI cluster interconnection. The S9300 series leverages a Spine-Leaf architecture to deliver a high-performance, large-scale networking solution. With Broadcom Trident 4 silicon, the S9300 provides 400 G ports at 12.8 Tbps, scaling up to 800 G links with Tomahawk 5 silicon, pushing aggregate bandwidth to 51.2 Tbps. TH5 also enhances load balancing and congestion control, ensuring seamless GPU-to-GPU traffic even as workloads intensify.

The S9700 series adopts a different approach, utilizing a distributed disaggregated chassis (DDC) architecture. This architecture interconnects Jericho series line card routers via Ramon series fabric routers to build a robust, large-scale routing system. Starting with Jericho 2-based line cards for 400 G connectivity, the S9700 scales up to 800 G links with Jericho 3 silicon and Ramon 3 fabric. The J3 platform achieves a remarkable switching capacity of 921 Tbps, significantly surpassing the J2 platform's 192 Tbps. Deep buffers and granular traffic management absorb the bursty traffic typical of AI training, while the open, non-blocking architecture ensures scalability without requiring a network redesign.

 

Learn More