Intelligent Agent Foundations Forumsign up / log in
Two Major Obstacles for Logical Inductor Decision Theory
post by Scott Garrabrant 823 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | 3 comments

In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework.

Obstacle 1: Untaken Actions are not Observable

Consider the following formalization of the 5 and 10 problem:

Let \(\{\mathbb{P}_n\}\) is a logical inductor. Let \(A_n\) be an agent which uses this logical inductor, to output either 5 or 10 as follows. The utility function \(U_n\) for agent \(A_n\) is simply \(U_n=A_n/10\), and the source code for agent \(A_n\) is given by \(A_n=\begin{cases}5&{\rm if }\ \mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)>\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\\10&{\rm otherwise.}\end{cases}\)

Ideally, we would be able to say something like \(lim_{n\rightarrow\infty}A_n=10\). Unfortunately, this is not the case. There exists a logical inductor such that \(lim_{n\rightarrow\infty}A_n=5\). Consider a construction of a logical inductor similar to the one in the paper, but for which there is a single trader that starts with most of the wealth. This trader spends all of its wealth on conditional contracts forcing \(\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)\sim .5\) and \(\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\sim 0\). Note that the bets made conditioned on \(A_n=5\) are accurate, while the bets made conditioned on \(A_n=10\) do not matter, since the condition will be false. (No other trader will have enough wealth to substantially change the expectations). This trader will therefore lose no money, and be able to do the same thing again next round. (This is assuming that the value of A_n and U_n are computed in time for round n+1 of the deductive process. If this is not the case, we could do the same trick on a subsequence with this property.)

This same phenomenon has been observed in many other contexts. The spurious counterfactuals that can arise in proof based systems are another manifestation of the same problem.

One attempt at a fix is epsilon exploration. (The analogous fix in the proof based world is the chicken rule) Here, you take every possible action with probability \(\varepsilon\). Then, when conditioning on taking an action you normally wouldn’t take, you will have some data on what happened when simpler versions of yourself randomly explored and took that action. The epsilon exploration version of the above agent is \(A_n=\begin{cases}5&{\rm if }\ \mathbb{P}_n(A_n=5)<\varepsilon\\10&{\rm if }\ \mathbb{P}_n(A_n=10)<\varepsilon\\5&{\rm if }\ \mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)>\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\\10&{\rm otherwise.}\end{cases}\)

This agent uses pseudorandomness the explore, and does in fact converge to choosing 10 all but epsilon proportion or the time (The lower density of taking 5 is at most \(\varepsilon\)). This fix has major problems. The obvious problem is that taking a bad action with probability epsilon could be disastrous for an agent that makes many different decisions.

There is a larger problem with this approach. There are now two different ways you could take any given action. You could take that action because it produces the highest expected utility, or you could take that action because your exploration clause was triggered. These two different ways of taking the action could be very different. Trivially, they could have different runtimes. More importantly, they could effect the extent to which your action is predictable.

This problem will, for example, cause a logical inductor EDT agent to defect in a prisoner’s dilemma against a similar power agent that is trying to imitate it. If such an agent were to start out cooperating, random defection will be uncorrelated with the opponent’s prediction. Thus the explored defection will be profitable, and the agent will learn to defect. The opponent will learn this and start predicting defection more and more, but in the long run, the agent view this as independent with its action.

To see concretely how epsilon exploration fails, consider a different utility function \(U_n=\frac{A_n+100\cdot \mathbb{P}_n(A_n=5)}{1000}\), and the same agent above with exploration. This agent gets a reward for taking the 10, but a much bigger reward for being predicted to take the 5. This is basically Newcomb’s problem, but the predictor is no more powerful than the agent, and in fact has the same epistemic state as the agent.

The problem is that in the \(\mathbb{E}(U_n)\), \(\mathbb{P}_n(A_n=5)\) is a constant that the agent has arbitrary access to. It is not changed by whether the agent takes the 5 or the 10 when exploring, since \(\mathbb{P}_n\) does not know if the agent will explore.

Thus, if the agent almost always took the 5, the rare occasions where it took the 10 would produce more utility, and similarly if the agent almost always took the 10, the rare occasions where it took the 5 would produce less utility. The agent will converge to almost always taking the 10, just like last time, and will miss out on the large reward for being predicted to take the 5.

Obstacle 2: No Updatelessness for Computations

The second obstacle is related to the counterfactual mugging problem. There is an agent \(A_n\), and a predictor \(\Omega_n\). The predictor is much more powerful than the agent, except there is a bit \(b_n\in\{0,1\}\) that the agent knows, but the predictor does not. The predictor’s beliefs about \(b_n\) are uniform. Again, the agent can take 5 or 10, However if b=1, then the agent will get a large reward if \(Q\) predicts that conditioned on \(b=0\), the agent takes the 5.

We will formalize this by giving the agent and the predictor two different logical inductors. The agent will use \(\mathbb{P}_n\) and the predictor will use \(\mathbb{Q}_n\). There will be a sequence of bits \(b_n\), which are sentences mutually independent with \(PA\). They will be given truth values independently, uniformly at random. \(P_n\) will use a deductive process knows the value of \(b_i\) for \(i\leq n\), and further will be inexplicable by traders that have oracle access to \(b_i\) for \(i\leq n\) on day \(n\). (Technically, logical inductors have computable deductive processes and traders, but it is not a problem to modify the definition to give oracle access to the traders and the program that computes the deductive process.)

\(\mathbb{Q}_n\) will be a very powerful logical inductor, but will receive the bits much slower. \(\mathbb{Q}_n\) will know all the bits \(b_i\) with \(f(i)<n\), for some fast growing function \(f\). \(\mathbb{Q}_n\) will also have perfectly accurate counterfactuals about what \(A_n\) will do if fed various fake values for \(b_{\leq n}\).

The utility function will be \(U_n=\frac{A_n+b_n\cdot 100\cdot \mathbb{Q}_n(A_n=5|b_n=0)}{1000}\).

First, observe that if \(b_n\) were not yet known by the agent, everyone would assign probability 1/2 to \(b_n\), and this problem would be similar to the Newcomblike problem above. However, by making \(\mathbb{Q}_n\) very powerful, we eliminated the difficulties above. The predictor can predict when the agent will explore, so exploring into 5 is rewarded and exploring into 10 is punished, and the trader would converge to almost always taking 10. However, by adding the hidden bit, we created new problems.

In particular, if \(b_n=1\), the agent has no control over what it would do if \(b_n=0\), and if \(b_n=0\), the agent does not care about what it is predicted to do. This agent will again converge to taking the \(10\) almost always, and miss out on the large reward (if \(b_n=1\)) for being predicted to take the 5 if \(b_n=0\). Ideally, the agent will take the 10 if \(b_n=1\), and take the 5 if \(b_n=0\).

Although this problem may seem contrived, it is very important. This kind of thing actually does show up all the time. If you know do not know a secret, it might be a good idea to keep plausible deniability that you know a secret. This might incur a social cost, which you are willing to pay, since it causes you to act the same way regardless of whether or not you know a secret, and thus cause yourself to counterfactually be able to keep the secret better if you had one. Poker is all about this phenomenon.

More importantly, this problem needs to be understood for reflective stability. If an agent does not know the value of \(b_n\) yet, but knows that it will take the 10 either way, the agent might want to commit to taking the 5. This is a failure of reflective stability. The agent would prefer to modify to use a different decision theory. The fact that this happens even in theory is a bad sign for any decision theory, and is an especially bad sign for our ability to understand the output of that decision theory.

In a Bayesian framework, this would solved using Updateless Decision Theory. The agent would not update on its observation of \(b_n\). It would instead use its prior about \(b_n\) to choose a policy, a function from its observation, \(b_n\), to its action \(A_n\). This strategy would work, and the agent would take the 10 only if \(b_n=1\).

Unfortunately, we do not know how to combine this strategy with logical uncertainty. The beliefs of a logical inductor do not look like bayesian beliefs where you can go back to your prior. (Universal Inductors were an attempt to do this, but they do not work for this purpose.)

by Vladimir Slepnev 810 days ago | Sam Eisenstat and Abram Demski like this | link

I’ll just note that in a modal logic or halting oracle setting you don’t need the chicken rule, as we found in this old post: So it seems like at least the first problem is about the approximation, not the thing being approximated.


by Sam Eisenstat 770 days ago | Abram Demski likes this | link

Yeah, the 5 and 10 problem in the post actually can be addressed using provability ideas, in a way that fits in pretty natually with logical induction. The motivation here is to work with decision problems where you can’t prove statements \(A = a \to U = u\) for agent \(A\), utility function \(U\), action \(a\), and utility value \(u\), at least not with the amount of computing power provided, but you want to use inductive generalizations instead. That isn’t necessary in this example, so it’s more of an illustration.

To say a bit more, if you make logical inductors propositionally consistent, similarly to what is done in this post, and make them assign things that have been proven already probability 1, then they will work on the 5 and 10 problem in the post.

It would be interesting if there was more of an analogy to explore between the provability oracle setting and the inductive setting, and more ideas could be carried over from modal UDT, but it seems to me that this is a different kind of problem that will require new ideas.






[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes


Privacy & Terms