LNS-Madam: Low-Precision Training in Log Using Multiplicative Weight Update arxiv.org 2 points by nabla9 8 hours ago
nabla9 8 hours ago The new Deepseek v3.1 is trained using the UE8M0 FP8 scale data format. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively.
The new Deepseek v3.1 is trained using the UE8M0 FP8 scale data format. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively.