it would be helpful if we knew whether for almost all random functions from ℝⁿ→ℝ, changing any one element of the domain slightly changes the output a lot

then one might also prove (or disprove) the same thing for functions implementable by some classes neural networks

@niplav no free lunch version is obviously true, because they’re all horribly discontinuous 🤔

@TetraspaceGrouping
Hm, true.

Per universal approximation theorem neural networks can approximate any function, but some functions are clearly easier to approximate than others

And the horribly discontinuous ones are probably very hard to approximate

Perhaps it's that K-Lipschitz continuous functions are easier to approximate for smaller K?

Okay new question which prior do neural networks with grad descent implement

Follow

@TetraspaceGrouping

When I said prior I meant inductive bias of course

Me dummy

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one