Intelligent Agent Foundations Forumsign up / log in
by Vadim Kosoy 137 days ago | Alex Appel likes this | link | parent

A few comments.

Traps are a somewhat bigger issue than you seem to think, when you take acausal attack into account. Your prior contains low complexity hypotheses that intentionally produce good predictions until some critical pivot point at which they switch to something manipulative. So, every time your reach such a pivot point you are facing a decision with irreversible consequences and there is no prior evidence to help you. Delegative learning gets around this my having the agent delegate precisely at the pivot point.

Even disregarding that, “Once the agent has figured out some of how the world works, most environments/hypotheses where there is a trap have evidential clues elsewhere to rule them out” is not quite true.

The notion of a “trap” is relative to the way you organize your uncertainty about the world. Saying that the environment might contain traps is saying that the class of environments you consider is unlearnable. However, the specification of a Bayesian agent only depends on the prior that you get by “averaging” the environments in this class. Different ways of decomposing the prior into hypotheses might yield learnable or unlearnable classes.

For example, consider an environment in which taking action A leads to heaven with probability 70% and hell with probability 30% whereas taking action B leads to heaven with probability 50% and hell with probability 50%. In this environment, taking action A is the better choice and there is no problem. However, if you decompose it into a mixture of deterministic environments then, from that perspective, you have a trap.

To give a “realistic” example, imagine that, we think that quantum random is truly random, but actually there is an underlying theory which allows predicting quantum events deterministically. Then, a strategy that seems optimal from the perspective of an agent that only knows quantum theory might be “suicidal” (permanently sacrificing value) from the perspective of an agent that knows the deeper underlying theory.

As another example, imagine that (i) the only way to escape the heat death of our universe is by controlled vacuum collapse and (ii) because we don’t know in which string vacuum we are, there is no way to be certain about the outcome of a controlled vacuum collapse without high energy experiments that have a significant chance of triggering an uncontrolled vacuum collapse. AFAIK this situation is consistent with our knowledge of physics. So, if you consider the different string vacua to be different hypotheses, we are facing a trap. On the other hand, if you have some theory that gives you a probability distribution over these vacua then there is a well-defined optimal strategy.

The point is, it is probably not meaningful/realistic to claim that we can design an agent that will almost certainly successfully deal with all traps, but it is meaningful and realistic to claim that we can design an agent that will be optimal relatively to our own posterior belief state (which is, more or less by definition, the sort of agent that it is a good idea to build).

The reason “explorative” algorithms such as PSRL (Posterior Sampling Reinforcement Learning) cannot be trivially replaced by the Bayes optimal policy, is that the Bayes optimal policy is (more) computationally intractable. For example, if you consider a finite set of finite MDP hypotheses then PSRL can be implemented in polynomial time but the Bayes optimal policy cannot (in fact I am not 100% sure about this, but I think that hole can be patched using the PCP theorem).



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms