Intelligent Agent Foundations Forumsign up / log in
Using modal fixed points to formalize logical causality
post by Vladimir Slepnev 1356 days ago | Abram Demski, Benja Fallenstein and Patrick LaVictoire like this | 10 comments

This post is a simplified introduction to existing ideas by Eliezer Yudkowsky, Wei Dai, Vladimir Nesov and myself. For those who already understand them, this post probably won’t contain anything new. As always, I take no personal credit for the ideas, only for the specific mathematical model.

People usually think about decision-making in terms of causality, where an agent’s action causes an outcome to happen. In this post I will outline a different idea of “causality”, which can work even if regular causality isn’t available. For example, in the world of natural numbers it doesn’t make sense to say that the sentence \(1+1=2\) somehow “causes” the sentence \(2+2=4\) to be true. Yet we can devise a notion of “logical causality” that will work in such worlds, and allow us to make decisions which maximize utility in some sense. The rest of this post is devoted to making these claims precise.


We start off with a simple formal model. Recall that provability logic is a modal logic where the operator \(\Box\) is interpreted as provability in Peano arithmetic. Let’s define a propositional formula with two variables \(F(a,o)=\Box(a→o)\). Since all occurrences of propositional variables are inside \(\Box\), we can build fixed points \(A\) such that \(A↔F(A,o)\) for different values of \(o\), and see what happens.

  1. If \(o=\top\) (the true sentence), then \(A=\top\), which is easy to verify directly.

  2. If \(o=\bot\) (the false sentence), then \(A=\Box\bot\), which can be verified by Löb’s theorem.

Now let’s go a little deeper, and let \(o\) itself be defined in terms of \(a\), using the fixed point to tie everything together:

  1. In the simplest case where \(o=a\), we see that \(A=\top\).

  2. If \(o=\neg a\) (the negation of \(a\)), we see that \(A=\Box\bot\). You can verify that by hand, using the methods from the previous points.

By now an interesting pattern is starting to emerge. Let’s say \(O\) is the formula that results from the variable \(o\) after applying the fixed point. In all cases except (2) where \(O\) is false by fiat, we see that \(A\) tries and succeeds to make \(O\) true! Metaphorically, here’s what \(A\) is doing: “If I can prove that my truth implies the truth of \(O\), then I choose to be true, otherwise I choose to be false”. That’s the basic idea of “logical causality”, in the simplest possible setting where it works.

Note that all sentences above can be interpreted as sentences about natural numbers within Peano arithmetic, by using Gödel numbering to spell out the provability operator. For example, in point (4) \(A\) would be a long sentence about the natural numbers, and \(O\) would be a slightly longer sentence that has \(A\) embedded inside it. In decision-making terms, \(A\) is the “action” and \(O\) is the “outcome” that logically depends on the action, but \(O\)’s dependence on \(A\) is not explicit, because both are closed formulas without free variables. Instead, the dependence is due to \(A\) knowing the Gödel number of \(O\).

To skip ahead a bit, it’s easy to go from a formalism about natural numbers to a formalism about computer programs, which know their own source code by quining. The examples above can be directly translated to programs that have access to a provability oracle for Peano arithmetic, or (with some caveats) to programs that successively search for proofs and check them manually. In fact, that was the original motivation for this line of research, because “programs trying to influence a larger program that they are embedded in” might be a good description of our own world, at least from the perspective of a program :-)

Going back to the math, perhaps the approach will break down if we have more possible choices than just true or false? Let’s assume a more general setting, where we have formulas \(F_1(a_1,\dotsc,a_m,o_1,\dotsc,o_n),\dotsc,F_m(a_1,\dotsc,a_m,o_1,\dotsc,o_n)\). We will denote truth assignments to \(a_1,\dotsc,a_m\) as \(\vec{a}\), and truth assignments to \(o_1,\dotsc,o_n\) as \(\vec{o}\). We will have a preference ordering on all possible \(\vec{o}\), and will be interested in fixed points \(A=\{A_1,\dotsc,A_m\}\) such that \(A_i↔F_i(A_1,\dotsc,A_m,o_1,\dotsc,o_n)\) for all \(i\). The formulas \(F_1,\dotsc,F_m\) will encode the execution of this algorithm:

  1. There are finitely many sentences of the form “if such-and such \(\vec{a}\) holds, then such-and-such \(\vec{o}\) holds”. Find all such sentences that are provable. If no such sentences were found, choose any \(\vec{a}\), e.g. all false.

  2. From all pairs \((\vec{a},\vec{o})\) found in the previous step, choose the \(\vec{a}\) whose \(\vec{o}\) is highest in our preference ordering. If there are multiple such \(\vec{a}\), choose any one, e.g. the lexicographically smallest.

  3. For each \(i\), define \(F_i\) to be true iff the chosen \(\vec{a}\) assigns true to \(a_i\).

To illustrate the definition of \(F_i\) in more detail, let’s work through the case where \(m=n=1\) and our preference ordering wants \(o_1\) to be true. On step 1 we have four possible sentences in lexicographic order: \(a_1→o_1\), \(a_1→\neg o_1\), \(\neg a_1→o_1\), and \(\neg a_1→\neg o_1\). Then \(F_1\) is true iff the chosen \(\vec{a}\) assigns true to \(a_1\), which can only happen on step 2. The corresponding \(\vec{o}\) can either assign true to \(o_1\), which happens iff the first sentence is provable, or assign false to \(o_1\), which happens iff the second sentence is provable but the first and third aren’t. Simplifying, we obtain the definition for \(F_1(a_1,o_1)=\Box(a_1→o_1)\vee(\Box(a_1→\neg o_1)\wedge\neg\Box(\neg a_1→o_1))\). By now it should be clear how to use the same algorithm for \(m,n>1\).

Which choices of \(o_1,\dotsc,o_n\) are amenable to this approach? Intuitively, it seems that “fair” deterministic problems are those where every choice of “actions” \(\vec{a}\) logically implies at least one “outcome” \(\vec{o}\), and these implications are apparent to the agent (i.e. provable). But that’s exactly the class of problems where our approach obviously gives the right answer! So it seems that having multiple possible choices doesn’t cause problems.

For example, let’s take \(m=2,n=1,o_1=a_1\wedge\neg a_2\), and assume that the preference ordering wants \(o_1\) to be true. Then it’s easy to see that the chosen \(\vec{a}\) is either (true, false), which provably implies \(o_1\), or something else that also provably implies \(o_1\). But the latter is impossible, because choosing any other \(\vec{a}\) makes \(o_1\) false, so it can’t be provable as long as the logic is sound. (Of course the logic doesn’t prove its own soundness, but we’re reasoning from the outside now.) Therefore the chosen \(\vec{a}\) is (true, false), and \(o_1\) is true.

One counterintuitive feature of our approach is that some “actions” \(\vec{a}\) might logically imply multiple different “outcomes” \(\vec{o}\) after taking the fixed point, because if an action is in fact not taken, it logically implies anything at all. However, the approach is designed so that the existence of such “spurious” logical implications can never lead to a suboptimal outcome. The proof of that is left as an easy exercise.



by Benja Fallenstein 1356 days ago | Patrick LaVictoire likes this | link

However, the approach is designed so that these “spurious” logical implications are unprovable, so they don’t interfere with decision-making. The proof of that is left as an easy exercise.

I don’t think this is technically true as stated; it seems to be possible that the agent proves some spurious counterfactuals as long as the outcome it does in fact obtain is the best possible one. (This is of course harmless!) Say the agent has two possible actions, \(\bar a^{(5)}\) and \(\bar a^{(10)}\), leading to outcomes \(\bar o^{(5)}\) and \(\bar o^{(10)}\), respectively. The latter is preferred, and these are the only two outcomes. Suppose that \(\bar a^{(10)}\) happens to be lexicographically lower than \(\bar a^{(5)}\) in the agent’s reasoning. Then it seems to be provable that the agent will in fact choose \(\bar a^{(10)}\), meaning that it’s provable that it won’t choose \(\bar a^{(5)}\), meaning that it finds both \((\bar a^{(5)},\bar o^{(5)})\) and the spurious \((\bar a^{(5)},\bar o^{(10)})\) in the first step.

So I think the correct statement is a disjunction: The agent obtains the highest possible outcome or it finds no spurious counterfactuals.

reply

by Vladimir Slepnev 1356 days ago | link

Yes, that’s correct. Thanks!

reply

by Patrick LaVictoire 1351 days ago | link

Ooh, nice: we don’t need to eliminate all spurious counterfactuals, only the malignant ones!

reply

by Abram Demski 1342 days ago | link

This implicitly contains a sort of chicken rule, since if the agent can prove that it will not take a particular \(\vec{a}\), it can proceed to prove arbitrarily good \((\vec{a}, \vec{o})\) for that \(\vec{a}\). So, it will want to take that action.

I wouldn’t really call this logical causality, by the way; to me, that suggests the ability to take arbitrarily mathematical counterfactuals (“What if \(\pi = 3\)?”), not only counterfactuals in service of actions.

reply

by Abram Demski 1342 days ago | link

This can evidently deal with probabilistic settings; the \(\vec{o}\) would encode an expected value which the agent is trying to maximize, with the obvious preference ordering. It would get unrealistic for very complicated situations, since expected values are very often intractable and must be approximated. Given that we know the exact probabilistic model, this is a sort of logical uncertainty.

How might this be modified to deal with logical uncertainty?

A simple idea that comes to mind is to modify the procedure to look for lower bounds on expected values, rather than trying to prove exact expected values. The procedure of choosing the best & breaking ties remains unmodified. (I’m not really trying to think about the modalized form at the moment.)

This should work for a lot more cases, but still has a feeling of being potentially unrealistic. It doesn’t take advantage of any approaches to logical uncertainty that have been discussed.

reply

by Abram Demski 1314 days ago | link

One really nice thing about the approach is that we do not have to identify the agent within the universe. All that is needed is the logical implications of supposing that the agent takes various actions. As a result, the agent will treat programs provably equivalent to its own as copies of itself with no fuss.

We can take a prior on all programs as our universe, with no special interface between the agent and the program as in AIXI. (We do need to somehow specify utilities on programs, which is a non-obvious procedure.) The agent then searches for proofs that if it takes some particular action, then at least some particular expected utility is achieved. There are no longer finitely many possible outcomes, so modal tricks to do this without a halting oracle won’t work. Instead, the agent must be doing this for some bounded time. It then takes the action with highest proven utility.

reply

by Benja Fallenstein 1356 days ago | link

I know this is supposed to be just introductory, but I actually think that the complete reformulation of UDT-with-a-halting-oracle in terms of modal logic is really interesting! For starters, it allows us to compare UDT and modal agents in the same framework (with the right \(o\)’s, we can see this version of UDT as a modal agent). It would also be really neat if we could write an “interpreter” that allows us to write UDT as a program calling a halting oracle, and then evaluate what it does by way of modal logic.

But also, it allows us to give a nice definition of “decision theory” and “decision problem” in the context with halting oracles. I was planning on posting soon about the definition that I showed you when you were visiting, which is designed for actually computable agents with bounded proof lengths, and is more complicated because of that. As a stepping stone to that, using the provability logic framework, I think we can define:

  • a decision theory to be a set of fully modalized formulas \(F_i(a_1,\dotsc,a_m,o_1,\dotsc,o_n)\), for \(i=1,\dotsc,m\) (fully modalized meaning that the propositional arguments only appear inside boxes);
  • and a decision problem to be a set of formulas \(G_j(a_1,\dotsc,a_m)\), for \(j=1,\dotsc,n\), which do not need to be modalized (which isn’t a problem because they’re not self-referential).

The condition that \(F_i\) must be fully modalized, but \(G_j\) doesn’t need to be, seems to be the natural thing corresponding to how in the bounded case, we allow the universe to run the agent, but the agent must use abstract reasoning about the universe, it can’t just run it.

Given such a set, we can define \(\tilde F_i(a_1,\dotsc,a_m)\), for \(i=1,\dotsc,m\), as follows: \[\tilde F_i(a_1,\dotsc,a_m) \,:=\, F_i(a_1,\dotsc,a_m,G_1(a_1,\dotsc,a_m),\dotsc,G_n(a_1,\dotsc,a_m)).\] Then the actual action taken by decision theory \(F\) when run on the decision problem \(G\) is the modal fixed point of the equations \(A_i\leftrightarrow\tilde F_i(A_1,\dotsc,A_m)\), for \(i=1,\dotsc,m\).

reply

by Vladimir Slepnev 1356 days ago | link

Yes, modal logic seems to be the most natural setting for these kinds of ideas. Also the “chicken rule” from the usual oracle formulations is gone now, I can’t remember why we needed it anymore.

reply

by Abram Demski 1342 days ago | link

In a probabilistic setting (with a prior over logical theories), EDT wants to condition on the possible actions with a Bayesian conditional, in order to then find the expected utility of each action.

If the agent can prove that it will take a particular action, then conditioning on this action may yield inconsistent stuff (divide-by-zero error for Bayesian conditional). This makes the result ill-defined.

The chicken rule makes this impossible, ensuring that the conditional probabilities are well defined.

So, the chicken rule at least seems useful for EDT.

reply

by Benja Fallenstein 1356 days ago | link

A model of UDT with a halting oracle searches only for one utility value for each action. I’m guessing the other formulation just wasn’t obvious at the time? (I don’t remember realizing the possibility of playing chicken implicitly before Will Sawin advertised it to me, though I think he attributed it to you.)

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms