Intelligent Agent Foundations Forumsign up / log in
by Alex Appel 30 days ago | Abram Demski likes this | link | parent

Hm, I got the same result from a different direction.

(probably very confused/not-even-wrong thoughts ahead)

It’s possible to view a policy of the form “I’ll compute X and respond based on what X outputs” as… tying your output to X, in a sense. Logical link formation, if you will.

And policies of the form “I’ll compute X and respond in a way that makes that output of X impossible/improbable” (can’t always do this) correspond to logical link cutting.

And with this, we see what the chicken rule in MUDT/exploration in LIDT is doing. It’s systematically cutting all the logical links it can, and going “well, if the statement remains correlated with me despite me trying my best to shake off anything that predicts me too well, I guess I”cause" it."

But some potentially-useful links were cut by this process, such as “having short abstract reasoning available that lets others predict what you will do” (a partner in a prisoner’s dilemma, the troll in troll bridge, etc..)

At the same time, some links should be cut by a policy that diagonalizes against predictions/calls upon an unpredictable process (anything that can be used to predict your behavior in matching pennies, evading Death when Death can’t crack your random number generator, etc…)

So I wound up with “predictable policy selection that forms links to stuff that would be useful to correlate with yourself, and cuts links to stuff that would be detrimental to have correlated with yourself”.

Predictably choosing an easy-to-predict policy is easy-to-predict, predictably choosing a hard-to-predict policy is hard-to-predict.

This runs directly into problem 1 of “how do you make sure you have good counterfactuals of what would happen if you had a certain pattern of logical links, if you aren’t acting unpredictably”, and maybe some other problems as well, but it feels philosophically appealing.



by Abram Demski 30 days ago | link

So I wound up with “predictable policy selection that forms links to stuff that would be useful to correlate with yourself, and cuts links to stuff that would be detrimental to have correlated with yourself”.

Agreed!

I’m reading this as “You want to make decisions as early as you can, because when you decide one of the things you can do is decide to put the decision off for later; but when you make a decision later, you can’t decide to put it earlier.”

And “logical time” here determines whether others can see your move when they decide to make theirs. You place yourself upstream of more things if you think less before deciding.

This runs directly into problem 1 of “how do you make sure you have good counterfactuals of what would happen if you had a certain pattern of logical links, if you aren’t acting unpredictably”, and maybe some other problems as well, but it feels philosophically appealing.

Here’s where I’m saying “just use the chicken rule again, in this stepped-back reasoning”. It likely re-introduces versions the same problems at the higher level, but perhaps iterating this process as many times as we can afford is in some sense the best we can do.

reply

by Abram Demski 29 days ago | link

Thinking about this more, I think there’s an important disanalogy between trying to make policy decisions with earlier market states vs smaller proof-searches.

In Agent Simulates Predictor, we can use an earlier market state to decide our policy, because the earlier market state can trust the predictor to make the right predictions, even if the predictor is using a more powerful logic (since logical inductors can learn to boundedly trust more powerful logics).

However, with proof-based DTs, no analogous move is possible.

Consider a version of Agent Simulates Predictor in which Omega searches for a proof that you one-box in PA+Con(PA); if one is found, Omega fills the $1m box. Otherwise, not. Omega has \(T_1\) time to think. The agent has \(T_2\) time to think, \(T_2 >> T_1\). The agent reasons in PA.

If the agent refused to use all its time, and only ran for \(T_0 << T_1\) time, but still had enough time to find interesting proofs, then it could reason as follows: “If I one-box, then there is a short proof that I one-box which Omega can find. So I get $1M.” It may not know if PA+Con(PA) is sound, but that doesn’t matter; the agent just has to ensure that there is a proof which Omega will find. It wouldn’t find any proofs leading to higher utility that this, so it would one-box and get $1M.

Unfortunately, I don’t see any way to harness the shorter proof-search to choose a policy which would get the $1M in this case but choose to think longer in other cases where that’s beneficial.

We might want the agent to reason: “If I stop and one-box right now, Omega will be able to prove that I one-box, and I’ll get $1M. If I wait longer, Omega won’t be able to prove what I do, so I’ll at most be able to get $100. So, I’ll stop now and one-box.” However, this reasoning would have to take place at a proof-length in which several things hold at once:

  • The agent can prove that it’s still “early” enough that its action would be provable to Omega if it acted now.
  • It’s “late” enough that the agent can see that Omega’s predictions are sound (IE, it can check that Omega doesn’t reach false results in the limited time it has). This allows the agent to see that it’ll never get money from both boxes.

It seems very unlikely that there is a proof length where these can both be true, due to bounded Löb.

For logical induction, on the other hand, there’s quite likely to be a window with analogous properties.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes

RSS

Privacy & Terms