I think it would be extremely cool if the "categorical cybernetics" bag of methods could say something about the relationship between inner and outer models - in particular, the observed fact that transformers learn gradient descent as one of the steps in their algorithm!
@julesh @bgavran @mc