Fixed point theorem in the finite and infinite case
discussion post by Victoria Krakovna 958 days ago | Jim Babcock and Patrick LaVictoire like this | discuss

Janos and I started working on extending the fixed point theorem to the infinite case at MIRI’s June decision theory workshop with Patrick and Benja. This post attempts to exhibit the finite version of the theorem and speculates about a plausible extension.

# Algorithm for the finite case

Suppose we’re given variables $$p_1,\dots,p_n$$, and statements $$p_i\leftrightarrow \phi_i(p_1,\dots,p_n)$$ where each $$\phi_i$$ is fully modalized (no variables occur outside modal subformulas). We will describe an algorithm for constructing sentences $$\psi_i$$ with no free variables, such that the original statements are equivalent to $$p_1\leftrightarrow \psi_1,\dots,p_n\leftrightarrow \psi_n$$.

For simplification purposes, we will assume that each $$\phi_i$$ is only singly modalized (none of the modal subformulas contain further modal subformulas). If not, we can introduce new variables for each subformula of the form $$\square \phi_{i,j}$$ that occurs in a fully modalized context.

Now consider a sequence of theories $$W_0,W_1,\dots$$, where $$W_i\equiv PA\cup\{\square^{i+1} \bot,\lnot\square^i\bot\}$$.

In $$W_0$$, it’s easy to determine the truth value of each $$p_i$$: every modal subformula can be replaced with $$\top$$, leaving a propositional formula with no variables.

Now suppose we’ve done this for $$W_0,\dots,W_{n-1}$$. Then, a statement $$\square \phi$$ will be true in $$W_n$$ iff

$PA\vdash(\square^{n+1}\bot\land\lnot\square^n\bot)\to\square\phi \Leftrightarrow\square^n\bot\lor\square(\square^n \bot \rightarrow \phi) \Leftrightarrow\square^n\bot\lor\square\bigwedge_{i=0}^{n-1}((\square^{i+1} \bot\land \lnot\square^i \bot)\rightarrow \phi)$

Therefore $$\square \phi$$ will be true in $$W_n$$ iff $$\phi$$ is true in $$W_0,\dots,W_{n-1}$$; this will let us evaluate the truth value of $$p_i$$ in all of the theories $$W_1,\dots,W_n$$.

It’s clearly the case that every modal subformula will have truth value stabilizing, therefore every $$p_i$$ will also. So there is an $$N$$ such that $$W_n$$ has the same truth values as $$W_N$$ for $$n>N$$. Now if $$W_N\vdash p_i$$, construct $$\psi_i\equiv\lnot\bigvee_{i:W_i\vdash \lnot p_i} (\square^{i+1}\bot\land\lnot\square^i\bot)$$; otherwise construct $$\psi_i\equiv\bigvee_{i:W_i\vdash p_i} (\square^{i+1}\bot\land\lnot\square^i\bot)$$. These are variable-free formulas.

# Infinite case example: Procrastination Bot

def ProcrastinationBot$$_N$$(X):

if $$PA+N \vdash \square X(PB_{N+1}) = C$$:

return C

else: return D

Constructing the fixed point

Let $$p_{ij}$$ denote whether $$PB_i$$ cooperates with $$PB_j$$. Then $p_{ij} \leftrightarrow \square_i p_{j, i+1} =\square_i \square_j p_{i+1,j+1}$ where $$\square_i$$ stands for $$\lnot \square^i \bot \rightarrow \square$$ (e.g. $$\square_0 = \square$$).

Then the following statements hold $p_{00} \leftrightarrow\square \square p_{11}$ $p_{11} \leftrightarrow \square_1 \square_1 p_{22} \equiv \lnot \square \bot \rightarrow \square (\lnot \square \bot \rightarrow \square p_{22})$ $\dots$

In $$W_0$$, any statement with a box in front of it is true, so any statement where a boxed statement is implied is also true. Thus, all the relevant statements are true in $$W_0$$:

Theory $$p_{00}$$ $$\square \square p_{11}$$ $$p_{11}$$ $$\lnot \square \bot \rightarrow \square (\lnot \square \bot \rightarrow \square p_{22})$$ $$\lnot \square \bot \rightarrow \square p_{22}$$ $$p_{22}$$ $$\dots$$
$$W_0$$ $$\top$$ $$\top$$ $$\top$$

In $$W_{i+1}$$, all the implied statements are boxed statements that were true in $$W_i$$, e.g. $$\square p_{22}$$ or $$\lnot \square \bot \rightarrow \square p_{22}$$. Thus, all the relevant statements are true in $$W_{i+1}$$:

Theory $$p_{00}$$ $$\square \square p_{11}$$ $$p_{11}$$ $$\lnot \square \bot \rightarrow \square (\lnot \square \bot \rightarrow \square p_{22})$$ $$\lnot \square \bot \rightarrow \square p_{22}$$ $$p_{22}$$ $$\dots$$
$$W_0$$ $$\top$$ $$\top$$ $$\top$$ $$\top$$ $$\top$$ $$\top$$
$$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$
$$W_{i+1}$$ $$\top$$ $$\top$$ $$\top$$

Thus, any Procrastination Bot cooperates with any other Procrastination Bot.

Existence and uniqueness of the fixed point via quining

Given a formula $$\psi(m,n)$$, there exists formula $$\phi(n)$$ such that $$\vdash\phi(n)\leftrightarrow\psi(\ulcorner\phi\urcorner,n)$$. The quine construction (thanks Benja!) is $\phi(n)\equiv\psi(\mbox{let }k=\ulcorner\psi(\mbox{let }k=x\mbox{ in }\operatorname{subst}_{\ulcorner x\urcorner}(\operatorname{quote}(k),k))\urcorner\mbox{ in }\operatorname{subst}_{\ulcorner x\urcorner}(\operatorname{quote}(k),k),n)$

Let $$\psi(\ulcorner \phi \urcorner, n) = \square\ulcorner \phi(\overline{n+1}) \urcorner$$. Then we have $PA \vdash \forall n: (\phi(n) \leftrightarrow \square\ulcorner \phi(\overline{n+1}) \urcorner).$

Now consider that $$\square \ulcorner \forall n: \phi(n) \urcorner \rightarrow \square\ulcorner \phi(\overline{n+1}) \urcorner$$. Using the above, we have $PA \vdash \square \ulcorner \forall n: \phi(n) \urcorner \rightarrow \forall n: \phi(n).$

By Lob’s theorem, $$PA \vdash \forall n: \phi(n)$$, so a fixed point exists.

Assume there are two fixed points $$\phi$$ and $$\phi'$$. Then we have $PA \vdash \forall n: [\phi(n) \leftrightarrow \psi(\ulcorner \phi \urcorner,n)] \land [\phi'(n) \leftrightarrow \psi(\ulcorner \phi' \urcorner,n)],$ $PA \vdash \forall \ulcorner \phi \urcorner, \ulcorner \phi' \urcorner: \square \ulcorner \forall n: \phi(n) \leftrightarrow \phi'(n) \urcorner \rightarrow [\psi(\ulcorner \phi \urcorner, n) \leftrightarrow \psi(\ulcorner \phi' \urcorner, n)]$ Thus, $$PA \vdash \forall n: \phi(n) \leftrightarrow \phi'(n)$$, so the fixed points are the same.

### NEW DISCUSSION POSTS

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
 by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
 by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
 by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
 by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
 by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
 by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
 by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
 by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
 by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
 by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes