Intelligent Agent Foundations Forumsign up / log in
by Abram Demski 118 days ago | link | parent

Thinking about this more, I think there’s an important disanalogy between trying to make policy decisions with earlier market states vs smaller proof-searches.

In Agent Simulates Predictor, we can use an earlier market state to decide our policy, because the earlier market state can trust the predictor to make the right predictions, even if the predictor is using a more powerful logic (since logical inductors can learn to boundedly trust more powerful logics).

However, with proof-based DTs, no analogous move is possible.

Consider a version of Agent Simulates Predictor in which Omega searches for a proof that you one-box in PA+Con(PA); if one is found, Omega fills the $1m box. Otherwise, not. Omega has \(T_1\) time to think. The agent has \(T_2\) time to think, \(T_2 >> T_1\). The agent reasons in PA.

If the agent refused to use all its time, and only ran for \(T_0 << T_1\) time, but still had enough time to find interesting proofs, then it could reason as follows: “If I one-box, then there is a short proof that I one-box which Omega can find. So I get $1M.” It may not know if PA+Con(PA) is sound, but that doesn’t matter; the agent just has to ensure that there is a proof which Omega will find. It wouldn’t find any proofs leading to higher utility that this, so it would one-box and get $1M.

Unfortunately, I don’t see any way to harness the shorter proof-search to choose a policy which would get the $1M in this case but choose to think longer in other cases where that’s beneficial.

We might want the agent to reason: “If I stop and one-box right now, Omega will be able to prove that I one-box, and I’ll get $1M. If I wait longer, Omega won’t be able to prove what I do, so I’ll at most be able to get $100. So, I’ll stop now and one-box.” However, this reasoning would have to take place at a proof-length in which several things hold at once:

  • The agent can prove that it’s still “early” enough that its action would be provable to Omega if it acted now.
  • It’s “late” enough that the agent can see that Omega’s predictions are sound (IE, it can check that Omega doesn’t reach false results in the limited time it has). This allows the agent to see that it’ll never get money from both boxes.

It seems very unlikely that there is a proof length where these can both be true, due to bounded Löb.

For logical induction, on the other hand, there’s quite likely to be a window with analogous properties.





[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes


Privacy & Terms