'we can interpret a model’s cross-entropy loss as “how distinguishable” the model is from its training distribution, and use that to upper bound the difficulty of training a model to perform reliable, high-quality reasoning over long sequences.'
https://epochai.org/blog/the-direct-approach
Empirical scaling laws can help predict the cross-entropy loss associated with training inputs, such as compute and data. However, in order to predict when AI will…
Includes an open review, bravely solicited from the grumpiest man in the neighbourhood
https://epochai.org/files/direct-approach-review-nuno-sempere.pdf
a Schelling point for those who seek one