China has recently found a promising solution to counteract NVIDIA’s restricted AI accelerators, thanks to DeepSeek’s latest project that reportedly outmatches current performance levels by eight times using Hopper H800s AI accelerators.
## How DeepSeek’s FlashMLA Could Revolutionize China’s AI Scene with NVIDIA’s Modified Hopper GPUs
China is clearly not sitting back when it comes to advancing its tech capabilities. Particularly, firms like DeepSeek are strategically leveraging software to make the most out of their existing hardware—a move that’s proving quite impressive. DeepSeek’s latest developments have caught many off guard as they’ve purportedly fine-tuned NVIDIA’s trimmed Hopper H800 GPUs to deliver substantial performance gains. They achieved this by enhancing memory management and smartly distributing resources across inference requests.
DeepSeek recently kicked off their “OpenSource” week, aiming to release technologies and assets accessible to everyone via GitHub repositories. The week’s inaugural highlight was the launch of FlashMLA, a specialized “decoding kernel” for NVIDIA Hopper GPUs. Before delving into FlashMLA’s mechanics, let’s explore the major improvements it has brought to the table, which seem nothing short of groundbreaking.
According to DeepSeek, FlashMLA has enabled the Hopper H800 to hit 580 TFLOPS for BF16 matrix calculations—a staggering figure that overshadows the standard industry benchmarks by about eightfold. Furthermore, with sharp memory optimizations in place, FlashMLA allows for memory throughput reaching up to 3000 GB/s, which is nearly double the theoretical peak of the H800. Remarkably, these achievements stem purely from crafty coding, without any hardware tweaks.
Incorporating “low-rank key-value compression,” DeepSeek’s FlashMLA simplifies data by breaking it down into smaller segments, facilitating quicker processing and reducing memory demands by as much as 40%-60%. Another novel feature is the block-based paging system, which caters memory allocation dynamically based on task demands rather than sticking to a static figure. This flexibility allows models to adeptly handle sequences of varying lengths, boosting overall efficiency.
DeepSeek’s innovation is a testament to the multifaceted nature of AI computing advancements. This diversity is brilliantly exemplified through FlashMLA. At present, this tool is designed exclusively for the Hopper GPUs, sparking curiosity about how it can potentially enhance performance on the H100 variant in the future.