I think it would be extremely cool if the "categorical cybernetics" bag of methods could say something about the relationship between inner and outer models - in particular, the observed fact that transformers learn gradient descent as one of the steps in their algorithm!

@julesh @bgavran @mc

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one