Two Major Obstacles for Logical Inductor Decision Theory post by Scott Garrabrant 217 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | 3 comments In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework. Obstacle 1: Untaken Actions are not Observable Consider the following formalization of the 5 and 10 problem: Let $$\{\mathbb{P}_n\}$$ is a logical inductor. Let $$A_n$$ be an agent which uses this logical inductor, to output either 5 or 10 as follows. The utility function $$U_n$$ for agent $$A_n$$ is simply $$U_n=A_n/10$$, and the source code for agent $$A_n$$ is given by $$A_n=\begin{cases}5&{\rm if }\ \mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)>\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\\10&{\rm otherwise.}\end{cases}$$ Ideally, we would be able to say something like $$lim_{n\rightarrow\infty}A_n=10$$. Unfortunately, this is not the case. There exists a logical inductor such that $$lim_{n\rightarrow\infty}A_n=5$$. Consider a construction of a logical inductor similar to the one in the paper, but for which there is a single trader that starts with most of the wealth. This trader spends all of its wealth on conditional contracts forcing $$\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)\sim .5$$ and $$\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\sim 0$$. Note that the bets made conditioned on $$A_n=5$$ are accurate, while the bets made conditioned on $$A_n=10$$ do not matter, since the condition will be false. (No other trader will have enough wealth to substantially change the expectations). This trader will therefore lose no money, and be able to do the same thing again next round. (This is assuming that the value of A_n and U_n are computed in time for round n+1 of the deductive process. If this is not the case, we could do the same trick on a subsequence with this property.) This same phenomenon has been observed in many other contexts. The spurious counterfactuals that can arise in proof based systems are another manifestation of the same problem. One attempt at a fix is epsilon exploration. (The analogous fix in the proof based world is the chicken rule) Here, you take every possible action with probability $$\varepsilon$$. Then, when conditioning on taking an action you normally wouldn’t take, you will have some data on what happened when simpler versions of yourself randomly explored and took that action. The epsilon exploration version of the above agent is $$A_n=\begin{cases}5&{\rm if }\ \mathbb{P}_n(A_n=5)<\varepsilon\\10&{\rm if }\ \mathbb{P}_n(A_n=10)<\varepsilon\\5&{\rm if }\ \mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=5)>\mathbb{E}^{\mathbb{P}_n}_n(U_n|A_n=10)\\10&{\rm otherwise.}\end{cases}$$ This agent uses pseudorandomness the explore, and does in fact converge to choosing 10 all but epsilon proportion or the time (The lower density of taking 5 is at most $$\varepsilon$$). This fix has major problems. The obvious problem is that taking a bad action with probability epsilon could be disastrous for an agent that makes many different decisions. There is a larger problem with this approach. There are now two different ways you could take any given action. You could take that action because it produces the highest expected utility, or you could take that action because your exploration clause was triggered. These two different ways of taking the action could be very different. Trivially, they could have different runtimes. More importantly, they could effect the extent to which your action is predictable. This problem will, for example, cause a logical inductor EDT agent to defect in a prisoner’s dilemma against a similar power agent that is trying to imitate it. If such an agent were to start out cooperating, random defection will be uncorrelated with the opponent’s prediction. Thus the explored defection will be profitable, and the agent will learn to defect. The opponent will learn this and start predicting defection more and more, but in the long run, the agent view this as independent with its action. To see concretely how epsilon exploration fails, consider a different utility function $$U_n=\frac{A_n+100\cdot \mathbb{P}_n(A_n=5)}{1000}$$, and the same agent above with exploration. This agent gets a reward for taking the 10, but a much bigger reward for being predicted to take the 5. This is basically Newcomb’s problem, but the predictor is no more powerful than the agent, and in fact has the same epistemic state as the agent. The problem is that in the $$\mathbb{E}(U_n)$$, $$\mathbb{P}_n(A_n=5)$$ is a constant that the agent has arbitrary access to. It is not changed by whether the agent takes the 5 or the 10 when exploring, since $$\mathbb{P}_n$$ does not know if the agent will explore. Thus, if the agent almost always took the 5, the rare occasions where it took the 10 would produce more utility, and similarly if the agent almost always took the 10, the rare occasions where it took the 5 would produce less utility. The agent will converge to almost always taking the 10, just like last time, and will miss out on the large reward for being predicted to take the 5. Obstacle 2: No Updatelessness for Computations The second obstacle is related to the counterfactual mugging problem. There is an agent $$A_n$$, and a predictor $$\Omega_n$$. The predictor is much more powerful than the agent, except there is a bit $$b_n\in\{0,1\}$$ that the agent knows, but the predictor does not. The predictor’s beliefs about $$b_n$$ are uniform. Again, the agent can take 5 or 10, However if b=1, then the agent will get a large reward if $$Q$$ predicts that conditioned on $$b=0$$, the agent takes the 5. We will formalize this by giving the agent and the predictor two different logical inductors. The agent will use $$\mathbb{P}_n$$ and the predictor will use $$\mathbb{Q}_n$$. There will be a sequence of bits $$b_n$$, which are sentences mutually independent with $$PA$$. They will be given truth values independently, uniformly at random. $$P_n$$ will use a deductive process knows the value of $$b_i$$ for $$i\leq n$$, and further will be inexplicable by traders that have oracle access to $$b_i$$ for $$i\leq n$$ on day $$n$$. (Technically, logical inductors have computable deductive processes and traders, but it is not a problem to modify the definition to give oracle access to the traders and the program that computes the deductive process.) $$\mathbb{Q}_n$$ will be a very powerful logical inductor, but will receive the bits much slower. $$\mathbb{Q}_n$$ will know all the bits $$b_i$$ with $$f(i)  by Vladimir Slepnev 204 days ago | Sam Eisenstat and Abram Demski like this | link I’ll just note that in a modal logic or halting oracle setting you don’t need the chicken rule, as we found in this old post: https://agentfoundations.org/item?id=4 So it seems like at least the first problem is about the approximation, not the thing being approximated. reply  by Sam Eisenstat 163 days ago | Abram Demski likes this | link Yeah, the 5 and 10 problem in the post actually can be addressed using provability ideas, in a way that fits in pretty natually with logical induction. The motivation here is to work with decision problems where you can’t prove statements \(A = a \to U = u$$ for agent $$A$$, utility function $$U$$, action $$a$$, and utility value $$u$$, at least not with the amount of computing power provided, but you want to use inductive generalizations instead. That isn’t necessary in this example, so it’s more of an illustration. To say a bit more, if you make logical inductors propositionally consistent, similarly to what is done in this post, and make them assign things that have been proven already probability 1, then they will work on the 5 and 10 problem in the post. It would be interesting if there was more of an analogy to explore between the provability oracle setting and the inductive setting, and more ideas could be carried over from modal UDT, but it seems to me that this is a different kind of problem that will require new ideas. reply

### NEW DISCUSSION POSTS

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes

 by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
 by Abram Demski on Predictable Exploration | 0 likes

Hm, I got the same result
 by Alex Appel on Predictable Exploration | 1 like

Paul - how widely do you want
 by David Krueger on Funding opportunity for AI alignment research | 0 likes

I agree, my intuition is that
 by Abram Demski on Smoking Lesion Steelman III: Revenge of the Tickle... | 0 likes