Surprising that implementing ADAM with numpy is like 20 lines
I would've thought SOTA to be more complicated than that
a Schelling point for those who seek one