The performance difference b/w AdamW 32‑bit and 8‑bit is very negligible, so the 8‑bit version will essentially match (or even slightly outperform) the full‑precision AdamW. I’m not as certain about the 4‑bit variant in my tests in small training it converges comparably, but in my longer‑run experiments, it tends to fall behind.