Intelligent Agent Foundations Forumsign up / log in
Using modal fixed points to formalize logical causality
post by Vladimir Slepnev 1446 days ago | Abram Demski, Benja Fallenstein and Patrick LaVictoire like this | discuss

This post is a simplified introduction to existing ideas by Eliezer Yudkowsky, Wei Dai, Vladimir Nesov and myself. For those who already understand them, this post probably won’t contain anything new. As always, I take no personal credit for the ideas, only for the specific mathematical model.

People usually think about decision-making in terms of causality, where an agent’s action causes an outcome to happen. In this post I will outline a different idea of “causality”, which can work even if regular causality isn’t available. For example, in the world of natural numbers it doesn’t make sense to say that the sentence \(1+1=2\) somehow “causes” the sentence \(2+2=4\) to be true. Yet we can devise a notion of “logical causality” that will work in such worlds, and allow us to make decisions which maximize utility in some sense. The rest of this post is devoted to making these claims precise.


We start off with a simple formal model. Recall that provability logic is a modal logic where the operator \(\Box\) is interpreted as provability in Peano arithmetic. Let’s define a propositional formula with two variables \(F(a,o)=\Box(a→o)\). Since all occurrences of propositional variables are inside \(\Box\), we can build fixed points \(A\) such that \(A↔F(A,o)\) for different values of \(o\), and see what happens.

  1. If \(o=\top\) (the true sentence), then \(A=\top\), which is easy to verify directly.

  2. If \(o=\bot\) (the false sentence), then \(A=\Box\bot\), which can be verified by Löb’s theorem.

Now let’s go a little deeper, and let \(o\) itself be defined in terms of \(a\), using the fixed point to tie everything together:

  1. In the simplest case where \(o=a\), we see that \(A=\top\).

  2. If \(o=\neg a\) (the negation of \(a\)), we see that \(A=\Box\bot\). You can verify that by hand, using the methods from the previous points.

By now an interesting pattern is starting to emerge. Let’s say \(O\) is the formula that results from the variable \(o\) after applying the fixed point. In all cases except (2) where \(O\) is false by fiat, we see that \(A\) tries and succeeds to make \(O\) true! Metaphorically, here’s what \(A\) is doing: “If I can prove that my truth implies the truth of \(O\), then I choose to be true, otherwise I choose to be false”. That’s the basic idea of “logical causality”, in the simplest possible setting where it works.

Note that all sentences above can be interpreted as sentences about natural numbers within Peano arithmetic, by using Gödel numbering to spell out the provability operator. For example, in point (4) \(A\) would be a long sentence about the natural numbers, and \(O\) would be a slightly longer sentence that has \(A\) embedded inside it. In decision-making terms, \(A\) is the “action” and \(O\) is the “outcome” that logically depends on the action, but \(O\)’s dependence on \(A\) is not explicit, because both are closed formulas without free variables. Instead, the dependence is due to \(A\) knowing the Gödel number of \(O\).

To skip ahead a bit, it’s easy to go from a formalism about natural numbers to a formalism about computer programs, which know their own source code by quining. The examples above can be directly translated to programs that have access to a provability oracle for Peano arithmetic, or (with some caveats) to programs that successively search for proofs and check them manually. In fact, that was the original motivation for this line of research, because “programs trying to influence a larger program that they are embedded in” might be a good description of our own world, at least from the perspective of a program :-)

Going back to the math, perhaps the approach will break down if we have more possible choices than just true or false? Let’s assume a more general setting, where we have formulas \(F_1(a_1,\dotsc,a_m,o_1,\dotsc,o_n),\dotsc,F_m(a_1,\dotsc,a_m,o_1,\dotsc,o_n)\). We will denote truth assignments to \(a_1,\dotsc,a_m\) as \(\vec{a}\), and truth assignments to \(o_1,\dotsc,o_n\) as \(\vec{o}\). We will have a preference ordering on all possible \(\vec{o}\), and will be interested in fixed points \(A=\{A_1,\dotsc,A_m\}\) such that \(A_i↔F_i(A_1,\dotsc,A_m,o_1,\dotsc,o_n)\) for all \(i\). The formulas \(F_1,\dotsc,F_m\) will encode the execution of this algorithm:

  1. There are finitely many sentences of the form “if such-and such \(\vec{a}\) holds, then such-and-such \(\vec{o}\) holds”. Find all such sentences that are provable. If no such sentences were found, choose any \(\vec{a}\), e.g. all false.

  2. From all pairs \((\vec{a},\vec{o})\) found in the previous step, choose the \(\vec{a}\) whose \(\vec{o}\) is highest in our preference ordering. If there are multiple such \(\vec{a}\), choose any one, e.g. the lexicographically smallest.

  3. For each \(i\), define \(F_i\) to be true iff the chosen \(\vec{a}\) assigns true to \(a_i\).

To illustrate the definition of \(F_i\) in more detail, let’s work through the case where \(m=n=1\) and our preference ordering wants \(o_1\) to be true. On step 1 we have four possible sentences in lexicographic order: \(a_1→o_1\), \(a_1→\neg o_1\), \(\neg a_1→o_1\), and \(\neg a_1→\neg o_1\). Then \(F_1\) is true iff the chosen \(\vec{a}\) assigns true to \(a_1\), which can only happen on step 2. The corresponding \(\vec{o}\) can either assign true to \(o_1\), which happens iff the first sentence is provable, or assign false to \(o_1\), which happens iff the second sentence is provable but the first and third aren’t. Simplifying, we obtain the definition for \(F_1(a_1,o_1)=\Box(a_1→o_1)\vee(\Box(a_1→\neg o_1)\wedge\neg\Box(\neg a_1→o_1))\). By now it should be clear how to use the same algorithm for \(m,n>1\).

Which choices of \(o_1,\dotsc,o_n\) are amenable to this approach? Intuitively, it seems that “fair” deterministic problems are those where every choice of “actions” \(\vec{a}\) logically implies at least one “outcome” \(\vec{o}\), and these implications are apparent to the agent (i.e. provable). But that’s exactly the class of problems where our approach obviously gives the right answer! So it seems that having multiple possible choices doesn’t cause problems.

For example, let’s take \(m=2,n=1,o_1=a_1\wedge\neg a_2\), and assume that the preference ordering wants \(o_1\) to be true. Then it’s easy to see that the chosen \(\vec{a}\) is either (true, false), which provably implies \(o_1\), or something else that also provably implies \(o_1\). But the latter is impossible, because choosing any other \(\vec{a}\) makes \(o_1\) false, so it can’t be provable as long as the logic is sound. (Of course the logic doesn’t prove its own soundness, but we’re reasoning from the outside now.) Therefore the chosen \(\vec{a}\) is (true, false), and \(o_1\) is true.

One counterintuitive feature of our approach is that some “actions” \(\vec{a}\) might logically imply multiple different “outcomes” \(\vec{o}\) after taking the fixed point, because if an action is in fact not taken, it logically implies anything at all. However, the approach is designed so that the existence of such “spurious” logical implications can never lead to a suboptimal outcome. The proof of that is left as an easy exercise.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms