For us, the most significant GPU launches are usually for general gaming, including NVIDIA RTX, AMD RX, or even Intel Arc cards. However, for the growing number of companies interested in AI, NVIDIA and AMD’s AI GPUs are their primary focus. Large enterprises will purchase whichever GPU offers the best performance, as long as the price is reasonable. After witnessing NVIDIA crush AMD’s best offerings a few days ago, the red team has reclaimed victory by using AI-optimized software, where the MI300X with vLLM outperforms the H100 using TensorRT-LLM.
ChatGPT could be considered the catalyst that ignited the immense interest in AI. Considering it was launched in November 2022, it’s astounding how much has changed within a year. Companies have shifted their focus from other sectors to artificial intelligence, which is now regarded as one of the most crucial markets in the tech world. NVIDIA, in particular, has struck gold, significantly increasing its revenue due to its lightning-fast AI GPUs.
Refusing to accept defeat, AMD’s MI300X with vLLM is 30% faster than the H100 with TensorRT-LLM. Currently, the company offering the best AI performance will attract numerous server and data center clients, as well as those using hardware to train their AI models. NVIDIA has excelled in this regard, boasting that their H100 is twice as quick as AMD’s top-of-the-line MI300X AI GPU.
NVIDIA completely annihilated AMD in inference performance, showing that the MI300X couldn’t compete and even accusing AMD of cheating. However, AMD has struck back using optimized AI software. In performance tests using the Llama 2 70B model (created by Meta), the MI300X platform achieves more than double the performance of the NVIDIA H100 HGX in vLLM. When using TensorRT-LLM for NVIDIA and vLLM for AMD, the red team wins again, this time with a 30% performance advantage. Additionally, the latency measured in seconds is slightly lower for AMD than NVIDIA.
AMD accuses NVIDIA of conducting an unfair AI performance comparison. Just as NVIDIA accused AMD of cheating, AMD now says the figures published by NVIDIA are unjust for various reasons. TensorRT-LLM was used on the H100 instead of vLLM used by AMD, thus providing NVIDIA with a substantial performance advantage in an area where AMD cannot compete. AMD claims that NVIDIA’s tests were not comparable because they used FP16 for the Instinct MI300X GPU and FP8 for the H100.
NVIDIA is accused of inverting AMD’s published relative latency performance data. Therefore, the performance benchmarks published now aim to provide a “fairer” comparison with their rival. We see that even using TensorRT-LLM on NVIDIA GPUs is not enough to beat AMD. However, we’re confident the two companies will continue competing for the title of AI king, as many sales are at stake. Meanwhile, NVIDIA will release its next-gen H200 GPUs in 2024, featuring the Blackwell architecture and boasting even faster performance.