'we can interpret a model’s cross-entropy loss as “how distinguishable” the model is from its training distribution, and use that to upper bound the difficulty of training a model to perform reliable, high-quality reasoning over long sequences.'
https://epochai.org/blog/the-direct-approach
Includes an open review, bravely solicited from the grumpiest man in the neighbourhood
https://epochai.org/files/direct-approach-review-nuno-sempere.pdf
a Schelling point for those who seek one