**niplav** @niplav@schelling.pt · Feb 16, 2024

**niplav** @niplav@schelling.pt · Feb 16, 2024

niplav @niplav@schelling.pt

Feb 16, 2024

If I were a Shard Theory person, I'd say that constitutional AI is a next step in training AIs in the similar way that humans are trained: Reinforcement learning from interacting with other agents, starting with a simple set of values

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

Feb 16, 2024

Willow Brook @Paradox@raru.re

@niplav I dunno what shard theory is, but I agree with this notion.

**niplav** @niplav@schelling.pt · Feb 16, 2024

**niplav** @niplav@schelling.pt · Feb 16, 2024

Feb 16, 2024

niplav @niplav@schelling.pt

@Paradox If you're interested: https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values

**niplav** @niplav@schelling.pt · Feb 16, 2024

**niplav** @niplav@schelling.pt · Feb 16, 2024

Feb 16, 2024

niplav @niplav@schelling.pt

@Paradox They claim that human learning is a lot like current AI training: A lot of self-supervised pre-training+some fine-tuning+a little bit of RL (and in this view then multi-agent RL on top)

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

Feb 16, 2024

Willow Brook @Paradox@raru.re

@niplav I would also agree with this. I think humans are just flesh AI.

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

Feb 16, 2024

Willow Brook @Paradox@raru.re

@niplav We seem to have complicated values because we have competing values. A long list of stuff we care about, at varying priorities, and these change depending on the situation, ie mood, environment, what's on our mind, etc. Things feel complicated when we're far from understanding every influencing factor. Pretty sure even AI kinda operates on this concept. It's not quite a blackbox, but it's rather dark. We know the framework, but not the details (they use millions of parameters, no shot we do).
Of course, arbitrary rules can also make stuff seem complication, even if you know them all, but our brain are pretty straightforward. We draw conceptual associations between a bunch of stuff, even if we all do it slightly differently, and there's always a pattern to them.

**niplav** @niplav@schelling.pt · 2024-02-16T21:18:07Z

niplav @niplav@schelling.pt

@Paradox not sure I understand

February 16, 2024 at 9:18 PM · · Tusky · · ·

**niplav** @niplav@schelling.pt · Feb 16, 2024

**niplav** @niplav@schelling.pt · Feb 16, 2024

Feb 16, 2024

niplav @niplav@schelling.pt

@Paradox drives seem really importamt, as do desires built on abstractions of those drives

**niplav** @niplav@schelling.pt · Feb 16, 2024

**niplav** @niplav@schelling.pt · Feb 16, 2024

Feb 16, 2024

niplav @niplav@schelling.pt

@Paradox adn drives are fulfilled at some point and then you gotta go to satisfy another one, running frpm drive to drive

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

**Willow Brook** @Paradox@raru.re · Feb 16, 2024

Feb 16, 2024

Willow Brook @Paradox@raru.re

@niplav So take this example. You like cheese, cars, and comedy.
Are you hungry? Do you like jokes about cheese? Cars with cheese patterns? Do you prefer stand up or tv shows? Those three values can be permutated. Now let's say you hate the color blue. Do you like blue cheese? Is your hatred of blue or your love for cheese more important?
Also drives usually aren't sated permanently. You fulfill it, you're fine for a while, then you get hungry for it again. Some specific goals and broad goals do get satisfied, like writing a certain poem or getting the job you want, but others like engaging in a hobby or spending time with a friend don't work like that.

**niplav** @niplav@schelling.pt · Feb 17, 2024

**niplav** @niplav@schelling.pt · Feb 17, 2024

Feb 17, 2024

niplav @niplav@schelling.pt

@Paradox yeah, thattracks with my model

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…