Characteristically open-minded + grounded post from @fhuszar on the deep learning shock to learning theory and the looming possibility of an LM shock
"[Previously] I said if your objective function doesn't reflect the task, no amount of engineering or hacks will help you bridge that gap...
I have now abandoned this argument as well... we have barely a clue what inductive biases SGD on a model like GPT-3 has..."
"the fact we can't describe it doesn't mean unreasonably helpful inductive biases can't be there. evidence is mounting that they are.
As intellectually unsatisfying as this is, the LLM approach works, but most likely not for any of the reasons we know. We may be surprised again"