Does anyone use "reinforcement learning from compiler feedback" to train LLMs for code generation? It seems like eg codex was just made by doing extra training on github code.

Follow

What I'm imagining is your generate a bunch of completions and penalize the ones that don't compile(or pass some other statically-performable check like generating no syntax errors or whatever)

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one