Thinking out loud what still doesn't work with giving AutoGPT agents instructions like "do X but respect human preferences while doing so".

• Inner optimizers are still a problem if they exist in the GPT models
• Do LLM agents have sufficient goal stability? I.e. when delegating & delegating further does the original goal get perturbed or even lost?
• Limited to the models' understanding of "human values"
• Doesn't solve ambitious value learning, model might generalise badly once in new domains

@niplav One the one hand, I'd be really happy if recursive LLMs could reach very high intelligence before anything else, because the capabilities are built out of parts (text-based communication) we can inspect. AI research turns into network epistemology.

On the other hand, if GPT-5 is accessible via API that doesn't control for this, some idiots are going to try their hardest to destroy the world with it (cf. ChaosGPT).

Follow

@rime
Always Less Dignified™

Network epistemology — sounds really nice :-) Perhaps that's what CoEms are getting at

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one