it would be helpful if we knew whether for almost all random functions from ℝⁿ→ℝ, changing any one element of the domain slightly changes the output a lot
then one might also prove (or disprove) the same thing for functions implementable by some classes neural networks
@niplav no free lunch version is obviously true, because they’re all horribly discontinuous 🤔
@TetraspaceGrouping
Hm, true.
Per universal approximation theorem neural networks can approximate any function, but some functions are clearly easier to approximate than others
And the horribly discontinuous ones are probably very hard to approximate
Perhaps it's that K-Lipschitz continuous functions are easier to approximate for smaller K?
Okay new question which prior do neural networks with grad descent implement
is this just reinventing the powerseeking theorems? gotta check