Fixed point theorem in the finite and infinite case
discussion post by Victoria Krakovna 872 days ago | Jim Babcock and Patrick LaVictoire like this | discuss

Janos and I started working on extending the fixed point theorem to the infinite case at MIRI’s June decision theory workshop with Patrick and Benja. This post attempts to exhibit the finite version of the theorem and speculates about a plausible extension.

Algorithm for the finite case

Suppose we’re given variables $$p_1,\dots,p_n$$, and statements $$p_i\leftrightarrow \phi_i(p_1,\dots,p_n)$$ where each $$\phi_i$$ is fully modalized (no variables occur outside modal subformulas). We will describe an algorithm for constructing sentences $$\psi_i$$ with no free variables, such that the original statements are equivalent to $$p_1\leftrightarrow \psi_1,\dots,p_n\leftrightarrow \psi_n$$.

For simplification purposes, we will assume that each $$\phi_i$$ is only singly modalized (none of the modal subformulas contain further modal subformulas). If not, we can introduce new variables for each subformula of the form $$\square \phi_{i,j}$$ that occurs in a fully modalized context.

Now consider a sequence of theories $$W_0,W_1,\dots$$, where $$W_i\equiv PA\cup\{\square^{i+1} \bot,\lnot\square^i\bot\}$$.

In $$W_0$$, it’s easy to determine the truth value of each $$p_i$$: every modal subformula can be replaced with $$\top$$, leaving a propositional formula with no variables.

Now suppose we’ve done this for $$W_0,\dots,W_{n-1}$$. Then, a statement $$\square \phi$$ will be true in $$W_n$$ iff

$PA\vdash(\square^{n+1}\bot\land\lnot\square^n\bot)\to\square\phi \Leftrightarrow\square^n\bot\lor\square(\square^n \bot \rightarrow \phi) \Leftrightarrow\square^n\bot\lor\square\bigwedge_{i=0}^{n-1}((\square^{i+1} \bot\land \lnot\square^i \bot)\rightarrow \phi)$

Therefore $$\square \phi$$ will be true in $$W_n$$ iff $$\phi$$ is true in $$W_0,\dots,W_{n-1}$$; this will let us evaluate the truth value of $$p_i$$ in all of the theories $$W_1,\dots,W_n$$.

It’s clearly the case that every modal subformula will have truth value stabilizing, therefore every $$p_i$$ will also. So there is an $$N$$ such that $$W_n$$ has the same truth values as $$W_N$$ for $$n>N$$. Now if $$W_N\vdash p_i$$, construct $$\psi_i\equiv\lnot\bigvee_{i:W_i\vdash \lnot p_i} (\square^{i+1}\bot\land\lnot\square^i\bot)$$; otherwise construct $$\psi_i\equiv\bigvee_{i:W_i\vdash p_i} (\square^{i+1}\bot\land\lnot\square^i\bot)$$. These are variable-free formulas.

Infinite case example: Procrastination Bot

def ProcrastinationBot$$_N$$(X):

if $$PA+N \vdash \square X(PB_{N+1}) = C$$:

return C

else: return D

Constructing the fixed point

Let $$p_{ij}$$ denote whether $$PB_i$$ cooperates with $$PB_j$$. Then $p_{ij} \leftrightarrow \square_i p_{j, i+1} =\square_i \square_j p_{i+1,j+1}$ where $$\square_i$$ stands for $$\lnot \square^i \bot \rightarrow \square$$ (e.g. $$\square_0 = \square$$).

Then the following statements hold $p_{00} \leftrightarrow\square \square p_{11}$ $p_{11} \leftrightarrow \square_1 \square_1 p_{22} \equiv \lnot \square \bot \rightarrow \square (\lnot \square \bot \rightarrow \square p_{22})$ $\dots$

In $$W_0$$, any statement with a box in front of it is true, so any statement where a boxed statement is implied is also true. Thus, all the relevant statements are true in $$W_0$$:

Theory $$p_{00}$$ $$\square \square p_{11}$$ $$p_{11}$$ $$\lnot \square \bot \rightarrow \square (\lnot \square \bot \rightarrow \square p_{22})$$ $$\lnot \square \bot \rightarrow \square p_{22}$$ $$p_{22}$$ $$\dots$$
$$W_0$$ $$\top$$ $$\top$$ $$\top$$

In $$W_{i+1}$$, all the implied statements are boxed statements that were true in $$W_i$$, e.g. $$\square p_{22}$$ or $$\lnot \square \bot \rightarrow \square p_{22}$$. Thus, all the relevant statements are true in $$W_{i+1}$$:

Theory $$p_{00}$$ $$\square \square p_{11}$$ $$p_{11}$$ $$\lnot \square \bot \rightarrow \square (\lnot \square \bot \rightarrow \square p_{22})$$ $$\lnot \square \bot \rightarrow \square p_{22}$$ $$p_{22}$$ $$\dots$$
$$W_0$$ $$\top$$ $$\top$$ $$\top$$ $$\top$$ $$\top$$ $$\top$$
$$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$
$$W_{i+1}$$ $$\top$$ $$\top$$ $$\top$$

Thus, any Procrastination Bot cooperates with any other Procrastination Bot.

Existence and uniqueness of the fixed point via quining

Given a formula $$\psi(m,n)$$, there exists formula $$\phi(n)$$ such that $$\vdash\phi(n)\leftrightarrow\psi(\ulcorner\phi\urcorner,n)$$. The quine construction (thanks Benja!) is $\phi(n)\equiv\psi(\mbox{let }k=\ulcorner\psi(\mbox{let }k=x\mbox{ in }\operatorname{subst}_{\ulcorner x\urcorner}(\operatorname{quote}(k),k))\urcorner\mbox{ in }\operatorname{subst}_{\ulcorner x\urcorner}(\operatorname{quote}(k),k),n)$

Let $$\psi(\ulcorner \phi \urcorner, n) = \square\ulcorner \phi(\overline{n+1}) \urcorner$$. Then we have $PA \vdash \forall n: (\phi(n) \leftrightarrow \square\ulcorner \phi(\overline{n+1}) \urcorner).$

Now consider that $$\square \ulcorner \forall n: \phi(n) \urcorner \rightarrow \square\ulcorner \phi(\overline{n+1}) \urcorner$$. Using the above, we have $PA \vdash \square \ulcorner \forall n: \phi(n) \urcorner \rightarrow \forall n: \phi(n).$

By Lob’s theorem, $$PA \vdash \forall n: \phi(n)$$, so a fixed point exists.

Assume there are two fixed points $$\phi$$ and $$\phi'$$. Then we have $PA \vdash \forall n: [\phi(n) \leftrightarrow \psi(\ulcorner \phi \urcorner,n)] \land [\phi'(n) \leftrightarrow \psi(\ulcorner \phi' \urcorner,n)],$ $PA \vdash \forall \ulcorner \phi \urcorner, \ulcorner \phi' \urcorner: \square \ulcorner \forall n: \phi(n) \leftrightarrow \phi'(n) \urcorner \rightarrow [\psi(\ulcorner \phi \urcorner, n) \leftrightarrow \psi(\ulcorner \phi' \urcorner, n)]$ Thus, $$PA \vdash \forall n: \phi(n) \leftrightarrow \phi'(n)$$, so the fixed points are the same.

NEW DISCUSSION POSTS

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes