 Two Major Obstacles for Logical Inductor Decision Theory post by Scott Garrabrant 823 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | 3 comments In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework. Obstacle 1: Untaken Actions are not Observable Consider the following formalization of the 5 and 10 problem: Let $$\{\mathbb{P}_n\}$$ is a logical inductor. Let $$A_n$$ be an agent which uses this logical inductor, to output either 5 or 10 as follows. The utility function $$U_n$$ for agent $$A_n$$ is simply $$U_n=A_n/10$$, and the source code for agent $$A_n$$ is given by $$A_n=\begin{cases}5&{\rm if }\ \mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)>\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\\10&{\rm otherwise.}\end{cases}$$ Ideally, we would be able to say something like $$lim_{n\rightarrow\infty}A_n=10$$. Unfortunately, this is not the case. There exists a logical inductor such that $$lim_{n\rightarrow\infty}A_n=5$$. Consider a construction of a logical inductor similar to the one in the paper, but for which there is a single trader that starts with most of the wealth. This trader spends all of its wealth on conditional contracts forcing $$\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)\sim .5$$ and $$\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\sim 0$$. Note that the bets made conditioned on $$A_n=5$$ are accurate, while the bets made conditioned on $$A_n=10$$ do not matter, since the condition will be false. (No other trader will have enough wealth to substantially change the expectations). This trader will therefore lose no money, and be able to do the same thing again next round. (This is assuming that the value of A_n and U_n are computed in time for round n+1 of the deductive process. If this is not the case, we could do the same trick on a subsequence with this property.) This same phenomenon has been observed in many other contexts. The spurious counterfactuals that can arise in proof based systems are another manifestation of the same problem. One attempt at a fix is epsilon exploration. (The analogous fix in the proof based world is the chicken rule) Here, you take every possible action with probability $$\varepsilon$$. Then, when conditioning on taking an action you normally wouldn’t take, you will have some data on what happened when simpler versions of yourself randomly explored and took that action. The epsilon exploration version of the above agent is $$A_n=\begin{cases}5&{\rm if }\ \mathbb{P}_n(A_n=5)<\varepsilon\\10&{\rm if }\ \mathbb{P}_n(A_n=10)<\varepsilon\\5&{\rm if }\ \mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)>\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\\10&{\rm otherwise.}\end{cases}$$ This agent uses pseudorandomness the explore, and does in fact converge to choosing 10 all but epsilon proportion or the time (The lower density of taking 5 is at most $$\varepsilon$$). This fix has major problems. The obvious problem is that taking a bad action with probability epsilon could be disastrous for an agent that makes many different decisions. There is a larger problem with this approach. There are now two different ways you could take any given action. You could take that action because it produces the highest expected utility, or you could take that action because your exploration clause was triggered. These two different ways of taking the action could be very different. Trivially, they could have different runtimes. More importantly, they could effect the extent to which your action is predictable. This problem will, for example, cause a logical inductor EDT agent to defect in a prisoner’s dilemma against a similar power agent that is trying to imitate it. If such an agent were to start out cooperating, random defection will be uncorrelated with the opponent’s prediction. Thus the explored defection will be profitable, and the agent will learn to defect. The opponent will learn this and start predicting defection more and more, but in the long run, the agent view this as independent with its action. To see concretely how epsilon exploration fails, consider a different utility function $$U_n=\frac{A_n+100\cdot \mathbb{P}_n(A_n=5)}{1000}$$, and the same agent above with exploration. This agent gets a reward for taking the 10, but a much bigger reward for being predicted to take the 5. This is basically Newcomb’s problem, but the predictor is no more powerful than the agent, and in fact has the same epistemic state as the agent. The problem is that in the $$\mathbb{E}(U_n)$$, $$\mathbb{P}_n(A_n=5)$$ is a constant that the agent has arbitrary access to. It is not changed by whether the agent takes the 5 or the 10 when exploring, since $$\mathbb{P}_n$$ does not know if the agent will explore. Thus, if the agent almost always took the 5, the rare occasions where it took the 10 would produce more utility, and similarly if the agent almost always took the 10, the rare occasions where it took the 5 would produce less utility. The agent will converge to almost always taking the 10, just like last time, and will miss out on the large reward for being predicted to take the 5. Obstacle 2: No Updatelessness for Computations The second obstacle is related to the counterfactual mugging problem. There is an agent $$A_n$$, and a predictor $$\Omega_n$$. The predictor is much more powerful than the agent, except there is a bit $$b_n\in\{0,1\}$$ that the agent knows, but the predictor does not. The predictor’s beliefs about $$b_n$$ are uniform. Again, the agent can take 5 or 10, However if b=1, then the agent will get a large reward if $$Q$$ predicts that conditioned on $$b=0$$, the agent takes the 5. We will formalize this by giving the agent and the predictor two different logical inductors. The agent will use $$\mathbb{P}_n$$ and the predictor will use $$\mathbb{Q}_n$$. There will be a sequence of bits $$b_n$$, which are sentences mutually independent with $$PA$$. They will be given truth values independently, uniformly at random. $$P_n$$ will use a deductive process knows the value of $$b_i$$ for $$i\leq n$$, and further will be inexplicable by traders that have oracle access to $$b_i$$ for $$i\leq n$$ on day $$n$$. (Technically, logical inductors have computable deductive processes and traders, but it is not a problem to modify the definition to give oracle access to the traders and the program that computes the deductive process.) $$\mathbb{Q}_n$$ will be a very powerful logical inductor, but will receive the bits much slower. $$\mathbb{Q}_n$$ will know all the bits $$b_i$$ with $$f(i)  by Vladimir Slepnev 810 days ago | Sam Eisenstat and Abram Demski like this | link I’ll just note that in a modal logic or halting oracle setting you don’t need the chicken rule, as we found in this old post: https://agentfoundations.org/item?id=4 So it seems like at least the first problem is about the approximation, not the thing being approximated. reply  by Sam Eisenstat 770 days ago | Abram Demski likes this | link Yeah, the 5 and 10 problem in the post actually can be addressed using provability ideas, in a way that fits in pretty natually with logical induction. The motivation here is to work with decision problems where you can’t prove statements \(A = a \to U = u$$ for agent $$A$$, utility function $$U$$, action $$a$$, and utility value $$u$$, at least not with the amount of computing power provided, but you want to use inductive generalizations instead. That isn’t necessary in this example, so it’s more of an illustration. To say a bit more, if you make logical inductors propositionally consistent, similarly to what is done in this post, and make them assign things that have been proven already probability 1, then they will work on the 5 and 10 problem in the post. It would be interesting if there was more of an analogy to explore between the provability oracle setting and the inductive setting, and more ideas could be carried over from modal UDT, but it seems to me that this is a different kind of problem that will require new ideas. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes