Wait, do neural networks implement a sensible prior?
(like the speed or simplicity prior?)
If yes, which one?
@mira
Makes sense, hadn't connected regularization with simplicity.
(tho I don't think I understand regularization enough yet to understand why it'd result in simplicity).
@niplav Don't they just pour a bucket of neural nodes over two buckets of training data and hardcode the bad stuff away?
@rune I think that we mostly can't reach into the resulting buckets of neural nodes and change stuff we don't like.
@niplav I was thinking they just have a word filter in front of the output and hardcode some default responses
@rune Ah, that's what you meant. Seems true 👍
@niplav The closest thing to a simplicity prior is a regularization term in the loss function with a penalty for large weights.
Having a large range leads to a sort of ordinal hierarchy of floats with some things never being able to interact gain. So overfitting i.e. memorizing restricted cases i.e. higher complexity.