Enfabrica Debuts Ethernet-Based AI Memory Fabric Slashing LLM Inference Costs by 50%

Enfabrica’s Elastic Memory Fabric System cuts AI inference costs by up to 50% per token using Ethernet-based memory sharing. It enables seamless, scalable multi-processor memory access for faster LLM workloads.

First Ethernet-Based AI Memory Fabric System Boosts LLM Efficiency

Enfabrica has introduced its Elastic Memory Fabric System (EMFASYS), an Ethernet-based AI memory fabric designed to cut AI inference costs per user per token by up to 50%. This hardware and software solution is built on Enfabrica’s proprietary SuperNIC silicon network interconnect technology, aiming to optimize AI memory management and improve efficiency in large language model (LLM) inference.

Addressing AI-Inference Bottlenecks

LLM inference demands massive data movement between memory and processing units. Modern workloads require 10 to 100 times more compute time per query compared to earlier models. Without efficient memory access, CPUs, GPUs, and TPUs can spend a lot of time idle, waiting for data.

The EMFASYS rack-mount system tackles this challenge by creating a high-performance virtual memory system that is scalable and fast. It allows multiple processors across different racks to share memory seamlessly, reducing delays and improving throughput.

How EMFASYS Works: A Hub-and-Spoke Model

At the core of EMFASYS is Enfabrica’s 3.2 Terabits/second Accelerated Compute Fabric SuperNIC (ACF-S). It connects up to 18 channels with 144 CXL memory lanes to 800/400 Gigabit Ethernet ports, all linked through high-speed RDMA Ethernet cabling.

This setup enables faster access for complex multi-GPU AI inference tasks, cutting costs by up to 50% per token per user and supporting efficient infrastructure scaling. The system supports shared memory targets up to 18 Terabytes of CXL DDR5 DRAM per node, offloading GPU and HBM memory.

Power consumption adds about one kilowatt to a rack system already drawing tens of kilowatts, while memory bandwidth is aggregated to enable transaction striping across multiple channels and Ethernet ports. The cost of fabric-attached memory stays below $20 per gigabyte.

Enfabrica’s CEO, Rochan Sankar, compares EMFASYS to an airline hub-and-spoke system: large data payloads can be offloaded and distributed efficiently to different processors, similar to how passengers are transferred from jumbo jets to regional flights without congestion or delays.

Flexible, Architecture-Agnostic Memory Interaction

Current LLM architectures, including GPUs and TPUs, share similar memory limitations due to integrated high-bandwidth memory (HBM). EMFASYS breaks these barriers by enabling high-performance memory interaction regardless of the underlying hardware.

According to Sankar, the system delivers a peak of 3.2 Terabits/second per accelerator, managing congestion and allowing for a composable system. This means a mix of Nvidia GPUs, AMD GPUs, memory, or storage can be connected with consistent data flow.

EMFASYS Now Available for Sampling

Enfabrica is currently sampling EMFASYS with several customers, providing complete systems and evaluation platforms. Proof-of-concept setups are also hosted at the company’s data center, demonstrating modularity, scalability, and customer expandability.

This solution presents a practical approach to improving AI inference efficiency and reducing infrastructure costs, especially as demand for large-scale AI models continues to grow.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

Enfabrica Debuts Ethernet-Based AI Memory Fabric Slashing LLM Inference Costs by 50%

First Ethernet-Based AI Memory Fabric System Boosts LLM Efficiency

Addressing AI-Inference Bottlenecks

How EMFASYS Works: A Hub-and-Spoke Model

Flexible, Architecture-Agnostic Memory Interaction

EMFASYS Now Available for Sampling

Related AI News for IT and Development

Agile Won't Cut It for AI: Meet the AI Product Operating Model

UiB opens AI Centre at SLATE, putting human learning first

Africa's $1 Trillion AI Roadmap: From Ignition to Scale by 2035

Trump order seeks to block state AI laws, threatens funding and ignites bipartisan backlash

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: