Intelligent Agent Foundations Forumsign up / log in
by Vladimir Nesov 981 days ago | Abram Demski likes this | link | parent

UDT, in its global policy form, is trying to solve two problems: (1) coordination between the instances of an agent faced with alternative environments; and (2) not losing interest in counterfactuals as soon as observations contradict them. I think that in practice, UDT is a wrong approach to problem (1), and the way in which it solves problem (2) obscures the nature of that problem.

Coordination, achieved with UDT, is like using identical agents to get cooperation in PD. Already in simple use cases we have different amounts of computational resources for instances of the UDT agent that could make the decision processes different, hence workarounds with keeping track of how much computation to give the decision processes, so that coordination doesn’t break, or hierarchies of decision processes that can access more and more resources. Even worse, the instances could be working on different problems and don’t need to coordinate at the level of computational resources needed to work on these problems. But we know that cooperation is possible in much greater generality, even between unrelated agents, and I think this is the right way of handling the differences between the instances of an agent.

It’s useful to restate the problem of not ignoring counterfactuals, as a problem of preserving values. It’s not quite reflective stability, as it’s stability under external observations rather than reflection, but when an agent plans for future observations it can change itself to preserve its values when the observations happen (hence “Son of CDT” that one-boxes). One issue is that the resulting values are still not right, they ignore counterfactuals that are not in the future of where the self-modification took place, and it’s even less clear how self-modification addresses computational uncertainty. So the problem is not just preserving values, but formulating them in the first place so that they can already talk about counterfactuals and computational resources.

I think that in the first approximation, the thing in common between instances of an agent (within a world, between alternative worlds, and at different times) should be a fixed definition of values, while the decision algorithms should be allowed to be different and to coordinate with each other as unrelated agents would. This requires an explanation of what kind of thing values are, their semantics, so that the same values (1) can be interpreted in unrelated situations to guide decisions, including worlds that don’t have our physical laws, and by agents that don’t know the physical laws of the situations they inhabit, but (2) retain valuation of all the other situations, which should in particular motivate acausal coordination as an instrumental drive. Each one of these points is relatively straightforward to address, but not together. I’m utterly confused about this problem, and I think it deserves more attention.



by Wei Dai 981 days ago | Ryan Carey likes this | link

But we know that cooperation is possible in much greater generality, even between unrelated agents

It seems to me like cooperation might be possible in much greater generality. I don’t see how we know that it is possible. Please explain?

Each one of these points is relatively straightforward to address, but not together.

I’m having trouble following you here. Can you explain more about each point, and how they can be addresses separately?

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms