@Alephwyr ah no pls
@rime wouldn't go as far as Ngo to say all of alignment risk comes from here but seems like a rather large source
@rime staring at ontological crises for a while makes me believe this too
More parsimonious ai-values might be pretty weird to humans as an axis, just as simplocity priors are strange
@rime love this explanation! Explains some tension: if some parts generalize twd altruism and others twd selfishness you have to find the equilibrium
Only white people can be racist. Other ethnicities lack the moral, intellectual and cultural abilities to b
@Paradox yeah, thattracks with my model
@Paradox adn drives are fulfilled at some point and then you gotta go to satisfy another one, running frpm drive to drive
@Paradox drives seem really importamt, as do desires built on abstractions of those drives
@Paradox not sure I understand
@Paradox They claim that human learning is a lot like current AI training: A lot of self-supervised pre-training+some fine-tuning+a little bit of RL (and in this view then multi-agent RL on top)
@Paradox If you're interested: https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values
I operate by Crocker's rules[1].