@Paradox loss as in predictive loss, a simple way of measuring predictive accuracy
We want loss to be as low as possible, because that corresponds to good performance
Sometimes capabillities emerge with sudden falls in loss, sometimes loss doesn't change much, we'd like to know when these capabilities will change or loss declines sharply
@niplav This is a feature of LLMs I'm not educated on. I know the basics of such systems.