Follow

I need to find evidence for/against the claim that there was a training run of GPT-2 that maximized negative log-loss I've heard it a couple of times on the internet and already spread the meme myself, but I haven't seen it in a paper or blogpost

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one