DeepEP is a communication library optimized for Mixture-of-Experts (MoE) and expert parallelism (EP), providing high-throughput and low-latency communication solutions for high-performance machine learning environments.
Provides all-to-all high-throughput GPU kernels for efficient Mixture-of-Experts dispatch and combine operations, supporting complex learning environments.
Includes specialized kernels for latency-sensitive inference tasks, minimizing delays through pure RDMA operations.
Utilizes FP8 and other low-precision operations to increase computational efficiency without sacrificing performance.
Introduces a method to overlap communication and computation tasks to enhance performance without using Streaming Multiprocessors (SM) resources.
Supports advanced configurations utilizing NVLink and RDMA for both intranode and internode communications, facilitating broader application.