@TetraspaceGrouping
Hm, true.
Per universal approximation theorem neural networks can approximate any function, but some functions are clearly easier to approximate than others
And the horribly discontinuous ones are probably very hard to approximate
Perhaps it's that K-Lipschitz continuous functions are easier to approximate for smaller K?
Okay new question which prior do neural networks with grad descent implement