To detect if text comes from LM X, randomly modify it and get X's logprobs of the original and the mod.
If p(original) > p(mod), classify as LM generated.
"to increase performance by 10% absolute, just take the majority-vote answer of several LM answers"
"to reduce resource use by 50%(!), use a large model to do rejection sampling of small models' output"
I guess chain of thought is itself one of these.