
Wait, do neural networks implement a sensible prior?

(like the speed or simplicity prior?)

If yes, which one?

@niplav The closest thing to a simplicity prior is a regularization term in the loss function with a penalty for large weights.

Having a large range leads to a sort of ordinal hierarchy of floats with some things never being able to interact gain. So overfitting i.e. memorizing restricted cases i.e. higher complexity.

Makes sense, hadn't connected regularization with simplicity.

(tho I don't think I understand regularization enough yet to understand why it'd result in simplicity).

@niplav Don't they just pour a bucket of neural nodes over two buckets of training data and hardcode the bad stuff away?

@rune I think that we mostly can't reach into the resulting buckets of neural nodes and change stuff we don't like.

@niplav I was thinking they just have a word filter in front of the output and hardcode some default responses

@rune Ah, that's what you meant. Seems true 👍

Sign in to participate in the conversation

a Schelling point for those who seek one