In the context of making inconsistent preferences consistent, these are fairly strong results.
Not sure about their approximation behavior, but I think this makes becoming a coherent agent very difficult.
• PCS and all c∈C have minimum graph edit distance of i. (Proof sketch: There is a graph for which all acyclic tournaments with the same (minimal) graph-edit distance don't contain a specific subgraph). Graph in picture, the minimal edit distance is 3, the non-preserved consistent subgraph is a2→a4. This extends to arbitrarily big consistent subgraphs (replace all edges with acyclic tournaments with n nodes).
Then you have the following impossibilities:
• PCS and f has polynomial runtime. (Proof sketch: finding an s of size k is in NP. Finding all of them is in NP as well.)
• PCS and C has polynomial size. (Proof sketch: You can construct a graph with exponentially many acyclic tournaments as subgraphs).
Let f be a function that takes as input an inconsistent preference i and produces a set C of consistent preferences. Let S be the set of subgraphs of i for which it holds that for every s∈S: s is an acyclic subgraph, s is a subgraph of i, and s is maximal in i for those properties.
S shall contain all s that fulfill the above conditions.
Let PCS be the property that for every s∈S, s is a subgraph of at least one c∈C (every consistent subpreference appears at least once in a consistent version)
I just watched the BBC clip on Emmy Noether :) Absolutely astonished and recommend it to anybody in mathstodon.xyz. I knew she was famous and intelligent; I never knew how extensive and foundational her other work was; "modern algebra", topology, gauge theory, ....
From the broadcast, I would put her up with the heroes of mathematics; Euler, Gauss, ..
I know I am smart but I also know that there are people who are _really_ really smart, and it's clear that Noether was one of them.
Before I watched it this morning, it occurred to me "what would the world be like if Godel and Noether had married" (or worked together); I found it beyond even my SciFi fueled imagination :)
The last points especially might be ameliorated by literally just appending "and don't optimize too hard" and "let yourself be shut down by a human" to the prompt?
Man I feel confused, but assuming that language models aren't infested with inner optimizers now I'm more hopeful?
Or am I missing something crucial here…
• Last point especially crucial in situations where such an agent starts recursively improving itself (e.g. training new models)
Thinking out loud what still doesn't work with giving AutoGPT agents instructions like "do X but respect human preferences while doing so".
• Inner optimizers are still a problem if they exist in the GPT models
• Do LLM agents have sufficient goal stability? I.e. when delegating & delegating further does the original goal get perturbed or even lost?
• Limited to the models' understanding of "human values"
• Doesn't solve ambitious value learning, model might generalise badly once in new domains
I operate by Crocker's rules[1].