nabla9 8 hours ago

The new Deepseek v3.1 is trained using the UE8M0 FP8 scale data format. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively.