**niplav** @niplav@schelling.pt · Apr 23, 2023

**niplav** @niplav@schelling.pt · Apr 23, 2023

niplav @niplav@schelling.pt

Apr 23, 2023

Thinking out loud what still doesn't work with giving AutoGPT agents instructions like "do X but respect human preferences while doing so".

• Inner optimizers are still a problem if they exist in the GPT models
• Do LLM agents have sufficient goal stability? I.e. when delegating & delegating further does the original goal get perturbed or even lost?
• Limited to the models' understanding of "human values"
• Doesn't solve ambitious value learning, model might generalise badly once in new domains

**rime** @rime@schelling.pt · 2023-04-24T05:58:25Z

rime @rime@schelling.pt

@niplav One the one hand, I'd be really happy if recursive LLMs could reach very high intelligence before anything else, because the capabilities are built out of parts (text-based communication) we can inspect. AI research turns into network epistemology.

On the other hand, if GPT-5 is accessible via API that doesn't control for this, some idiots are going to try their hardest to destroy the world with it (cf. ChaosGPT).

April 24, 2023 at 5:58 AM · · · ·

**niplav** @niplav@schelling.pt · Apr 24, 2023

**niplav** @niplav@schelling.pt · Apr 24, 2023

Apr 24, 2023

niplav @niplav@schelling.pt

@rime
Always Less Dignified™

Network epistemology — sounds really nice :-) Perhaps that's what CoEms are getting at

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…