Has anyone tried to train neural networks to predict sudden drops in loss of LLM training?
We surely can observe many scaling curves from many different tasks
@Paradox loss as in predictive loss, a simple way of measuring predictive accuracy
We want loss to be as low as possible, because that corresponds to good performance
Sometimes capabillities emerge with sudden falls in loss, sometimes loss doesn't change much, we'd like to know when these capabilities will change or loss declines sharply
@niplav This is a feature of LLMs I'm not educated on. I know the basics of such systems.
@niplav What kind of loss is that? How does that happen?