China is rapidly advancing in technology, aware that they are behind the west. Although the US has set them back 10 years in chips, technological progress does not stop, and they will fight with whatever silicon they have available, as is the case once again with Loongson. Taking advantage of the official presentation of the 3A6000, the company also revealed something very interesting, their GPGPU Loongson LG200.
The presentation was brief, with few details shown, but it demonstrates that China will not rely solely on Huawei designs, and their companies need more advanced and comprehensive hardware to run specific software. Therefore, much like NVIDIA or AMD did, the Chinese company slipped in information about its latest offering for AI.
Loongson LG200 is a first-generation GPGPU with a second-generation GPU. The first-generation designs were a preview of what will arrive in 2024. A GPGPU, by definition, is a general-purpose graphics processing unit, and can be of various types, although the company has chosen the CPU+GPU option so that computation is not hindered by the lack of a traditional processor.
What we have is a fairly limited CPU that will perform on par with the 3A5000 of the previous generation. Instead, this will join the second-generation LG200 in what Loongson has called the 2K3000 (CPCPU). The theoretical concept is akin to AMD’s MI300 with a touch of NVIDIA’s Grace Superchip.
Fortunately, the company has provided some key data, and although not spectacular, for their first GPGPU it poses a minor problem in the entry-level segment of such devices for AMD and NVIDIA, but not for Intel, which canceled its GPGPU to focus exclusively on GPUs for AI.
Up to 1 TFLOP of performance per node
What we see is the support diagram for the LG200 itself, which represents Loongson’s second generation. It is designed to work in three specific scenarios:
1. Graphic acceleration.
2. Scientific computation acceleration.
3. AI acceleration.
Thus, this part of the GPGPU is purely an accelerator for DL and LLM. Support is not the best yet, but it will initially work with OpenCL 4.0 as an API for graphics, and with OpenCL 3.0 for general computing.
Additionally, it will work with INT8 thanks to a kind of Tensor Cores, which the company has not explicitly named. The block diagram is very similar to what NVIDIA has in their SM, surprisingly so. In any case, Loongson claims that this LG200 will achieve performance per node between 256 GFLOPS and 1 TFLOPS, with the ability to interconnect multiple nodes, although they did not specify how many nor how it will be done.
The CPCPU part is based on the 3C6000, which is the new generation CPU platform, as we saw last month. Therefore, we understand it will have a low frequency prioritizing the total number of cores over speed, hence the statement that single-core performance is equal to the 3A5000 and not the 3A6000. Finally, Loongson commented that this LG200 will arrive in the first quarter of 2024.
The introduction of the Chinese GPGPU, Loongson LG200, to not depend on NVIDIA in AI, is ready.