**Rai** @agentydragon@mastodon.social · Jan 05, 2024, 22:14

**Rai** @agentydragon@mastodon.social · Jan 05, 2024, 22:14

Rai @agentydragon@mastodon.social

Jan 05, 2024, 22:14

done reading https://arxiv.org/abs/2204.05212
weird, unexpected result

= setup =
give a human a reading comprehension task about a looong sci-fi story, with A/B options. give them 2 arguments with supporting quotes, one arguing for A, one for B. give them 90 seconds to read both arguments & quotes, and consult the text.

measure how often they pick the right option.

= result =
there's **no difference** between showing them just the quotes, or quotes+arguments.

**Rai** @agentydragon@mastodon.social · Jan 05, 2024, 22:17

**Rai** @agentydragon@mastodon.social · Jan 05, 2024, 22:17

Jan 05, 2024, 22:17

Rai @agentydragon@mastodon.social

this result matters because AI safety via debate (https://arxiv.org/abs/1805.00899) is one of the main proposals for how to align strong superhuman AIs.

it's basically a scaled up version of this, where you have 2 players arguing about whether A/B is right, and a human judges the result based on the arguments.

I am confused by this negative result and hope it's wrong / doesn't hold for bigger, non-toy problems.

**niplav** @niplav@schelling.pt · 2024-03-24T19:03:08Z

niplav @niplav@schelling.pt

@agentydragon someone needs to review the entire evidence here

In general havong Debate be not robust enough to ~always work (see also Obfuscated Arguments) is a bad sign

Mar 24, 2024, 19:03 · · Tusky · · ·

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…