Intelligent Agent Foundations Forumsign up / log in
by Alex Appel 333 days ago | Abram Demski likes this | link | parent

Hm, I got the same result from a different direction.

(probably very confused/not-even-wrong thoughts ahead)

It’s possible to view a policy of the form “I’ll compute X and respond based on what X outputs” as… tying your output to X, in a sense. Logical link formation, if you will.

And policies of the form “I’ll compute X and respond in a way that makes that output of X impossible/improbable” (can’t always do this) correspond to logical link cutting.

And with this, we see what the chicken rule in MUDT/exploration in LIDT is doing. It’s systematically cutting all the logical links it can, and going “well, if the statement remains correlated with me despite me trying my best to shake off anything that predicts me too well, I guess I”cause" it."

But some potentially-useful links were cut by this process, such as “having short abstract reasoning available that lets others predict what you will do” (a partner in a prisoner’s dilemma, the troll in troll bridge, etc..)

At the same time, some links should be cut by a policy that diagonalizes against predictions/calls upon an unpredictable process (anything that can be used to predict your behavior in matching pennies, evading Death when Death can’t crack your random number generator, etc…)

So I wound up with “predictable policy selection that forms links to stuff that would be useful to correlate with yourself, and cuts links to stuff that would be detrimental to have correlated with yourself”.

Predictably choosing an easy-to-predict policy is easy-to-predict, predictably choosing a hard-to-predict policy is hard-to-predict.

This runs directly into problem 1 of “how do you make sure you have good counterfactuals of what would happen if you had a certain pattern of logical links, if you aren’t acting unpredictably”, and maybe some other problems as well, but it feels philosophically appealing.



by Abram Demski 333 days ago | link

So I wound up with “predictable policy selection that forms links to stuff that would be useful to correlate with yourself, and cuts links to stuff that would be detrimental to have correlated with yourself”.

Agreed!

I’m reading this as “You want to make decisions as early as you can, because when you decide one of the things you can do is decide to put the decision off for later; but when you make a decision later, you can’t decide to put it earlier.”

And “logical time” here determines whether others can see your move when they decide to make theirs. You place yourself upstream of more things if you think less before deciding.

This runs directly into problem 1 of “how do you make sure you have good counterfactuals of what would happen if you had a certain pattern of logical links, if you aren’t acting unpredictably”, and maybe some other problems as well, but it feels philosophically appealing.

Here’s where I’m saying “just use the chicken rule again, in this stepped-back reasoning”. It likely re-introduces versions the same problems at the higher level, but perhaps iterating this process as many times as we can afford is in some sense the best we can do.

reply

by Abram Demski 332 days ago | link

Thinking about this more, I think there’s an important disanalogy between trying to make policy decisions with earlier market states vs smaller proof-searches.

In Agent Simulates Predictor, we can use an earlier market state to decide our policy, because the earlier market state can trust the predictor to make the right predictions, even if the predictor is using a more powerful logic (since logical inductors can learn to boundedly trust more powerful logics).

However, with proof-based DTs, no analogous move is possible.

Consider a version of Agent Simulates Predictor in which Omega searches for a proof that you one-box in PA+Con(PA); if one is found, Omega fills the $1m box. Otherwise, not. Omega has \(T_1\) time to think. The agent has \(T_2\) time to think, \(T_2 >> T_1\). The agent reasons in PA.

If the agent refused to use all its time, and only ran for \(T_0 << T_1\) time, but still had enough time to find interesting proofs, then it could reason as follows: “If I one-box, then there is a short proof that I one-box which Omega can find. So I get $1M.” It may not know if PA+Con(PA) is sound, but that doesn’t matter; the agent just has to ensure that there is a proof which Omega will find. It wouldn’t find any proofs leading to higher utility that this, so it would one-box and get $1M.

Unfortunately, I don’t see any way to harness the shorter proof-search to choose a policy which would get the $1M in this case but choose to think longer in other cases where that’s beneficial.

We might want the agent to reason: “If I stop and one-box right now, Omega will be able to prove that I one-box, and I’ll get $1M. If I wait longer, Omega won’t be able to prove what I do, so I’ll at most be able to get $100. So, I’ll stop now and one-box.” However, this reasoning would have to take place at a proof-length in which several things hold at once:

  • The agent can prove that it’s still “early” enough that its action would be provable to Omega if it acted now.
  • It’s “late” enough that the agent can see that Omega’s predictions are sound (IE, it can check that Omega doesn’t reach false results in the limited time it has). This allows the agent to see that it’ll never get money from both boxes.

It seems very unlikely that there is a proof length where these can both be true, due to bounded Löb.

For logical induction, on the other hand, there’s quite likely to be a window with analogous properties.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms