Reflective Probability Distributions and Standard Models of Arithmetic
post by Kaya Stechly 719 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | discuss

Thanks to Benya for the result and to Rafael for helping write this. This result is included in the latest draft of the new Definability of Truth paper.

Working in the framework presented in Definability of Truth, we prove that every reflective, coherent $$\mathbb{P}$$ is supported only on non-standard models. We construct a finitely additive measure over the language which is supported on standard models and satisfies many of our desiderata. This shows that $$\mathbb{P}$$’s non-standard support is a direct result of requiring countable additivity.

## Background

Note that some of the notation has changed from Definability of Truth: $$\mathbb{P}(\varphi)$$ is written as $$\mathbb{P}[\varphi]$$. This is to distinguish this as a function from the language to the reals, whereas $$\mathbb{P}(A)$$, for some $$A$$, is a function from the powerset of the set of all theories to the reals. Benya has shown that $$\mathbb{P}[\varphi]=\mathbb{P}(\{T:\varphi\in T\})$$, so we abuse notation and use the same symbol for both functions.

We now write $$P$$ for the inner language symbol instead of $$\mathbb{P}$$, as they are importantly different entities. Recall that $$\mathbb{P}$$ is a function from the language to $$[0,1]$$, and is a function in the metalanguage. We have modified the definition of $$P$$ sufficiently that we discuss it in the next section.

### $$P$$ as a three place relation

Since we are now working in the realm of Peano arithmetic, we can weaken our definition of our inner probability distribution. Instead of defining $$P$$ as a function in the language, it is sufficient for $$P$$ to be a three place relation which arbitrarily approximates some metalanguage function.

Definition 1. Let $$L'$$ be the extension of the language $$L$$ (which contains at least the language of arithmetic) by a relation symbol $$P$$. A standard theory is a complete theory over $$L'$$ which extends the theory of the standard natural numbers (over $$L$$) and satisfies the following two conditions: $$P\subset\mathbb{N}^3$$ and $\exists f : L'\rightarrow[0, 1] \text{ s.t. } \forall \varphi~\forall a, b\in\mathbb{Q}: P(\dot a,\ulcorner \varphi \urcorner, \dot b) \iff f(\varphi) \in (a, b).$

Intuitively, $$P$$ is a three-valued relation which approximates some function $$f$$ arbitrarily well. We imagine $$P(\dot a,\ulcorner\varphi\urcorner,\dot b)$$ to mean that $$a<f(\varphi)<b$$, where $$f$$ is the function that we require to exist above.

## Disbelief in Standard Theories

We would like to construct coherent, reflective distributions which are supported on standard theories of Peano Arithmetic. We prove that not only do such distributions not exist, but every coherent, reflective distribution must assign zero probability to any set of standard theories.

Theorem 1. Let $$T$$ be a consistent theory which extends Peano Arithmetic. Let $$\mathbb{P}$$ be a coherent, reflective probability distribution over $$T$$. Then, $$\mathbb{P}$$ must assign probability zero to any set of standard theories of $$T$$.

Proof. Consider the sentence $$G$$ defined by $$G \iff P(-\dot 1,\ulcorner G\urcorner,\dot 1)$$. Then, $$\mathbb{P}(G)=1$$. Applying the reflection principle, we get that $\forall\epsilon>0: \mathbb{P}(\{T:P(\dot 1-\dot \epsilon,\ulcorner G\urcorner,\dot 1)\in T\})=\dot 1.$ Now, $\{T:\forall \epsilon>0,P(\dot 1-\dot \epsilon,\ulcorner G\urcorner,\dot 1)\in T\}$ $=\bigcap_{\epsilon>0}\{T:P(\dot 1-\dot \epsilon,\ulcorner G\urcorner,\dot 1)\in T\},$ so we apply countable additivity and De Morgan’s laws to get that $\mathbb{P}(\{T:\forall \dot\epsilon>0,P(\dot 1-\dot \epsilon,\ulcorner G\urcorner,\dot 1)\in T\})=1.$

This means that $$\mathbb{P}$$ assigns probability zero to any set of theories which do not prove that $$\forall \epsilon>0,P(\dot 1-\dot \epsilon,\ulcorner G\urcorner,\dot 1)$$. In particular, no standard theory can prove this statement, as otherwise it would contain a number smaller than one which was also greater than any standard rational smaller than one. $$\square$$

We can trace the cause of the result in the last section back to the countable additivity condition. To show this, we construct a finitely additive measure which fulfills many of our desiderata and is supported on standard theories.

Definition 2. Define the set $$\mathcal{N}$$ to be the set of all standard theories. A $$\mathcal{N}_{\varphi}\subset \mathcal{N}$$ is defined as $$\mathcal{N} _{\varphi}:=\{T:\varphi\in T\}$$. We define the base theory $$T_0$$ as $$T_0:=\bigcap \mathcal{N}$$.

Definition 3. Define a function $$\mu$$ such that $$\mu(\mathcal{N}_{\varphi}):=\mathbb{P}[\varphi]$$.

Theorem 2. Existence of Finitely Additive Measure Supported on Standard Theories. $$\mu$$ is well-defined, finitely additive, and satisfies a version of the reflection principle: $a < \mu(\mathcal{N}_{\varphi}) < b \Rightarrow \mu(\mathcal{N}_{P(\dot a, \ulcorner \varphi \urcorner,\dot b)}) = 1.$

Proof. We first show that $$\mu$$ is well-defined. By consistency of the reflection principle, we have that there exists a coherent, reflective $$\mathbb{P}$$ over our base theory $$T_0$$.

Say that $$\mathcal{N}_{\varphi}=\mathcal{N}_{\psi}$$. Then, by definition, $$\varphi$$ and $$\psi$$ are in exactly the same complete theories, so $$\varphi\leftrightarrow\psi$$ must be logically valid. Therefore, by completeness, $$T_0$$ proves $$\varphi\leftrightarrow\psi$$, so by Gaifman coherence, $$\mathbb{P}[\varphi]=\mathbb{P}[\psi]$$. Hence, $$\mu$$ is well defined.

We now check that $$\mu$$ is a finitely additive measure. Clearly, $$\mu(\mathcal{N}_{\varphi})\in[0,1]$$ and $$\mu(\emptyset)=\mathbb{P}[\bot]=0$$. We need only check that it is finitely additive. $\mu(\mathcal{N}_{\varphi}\cup \mathcal{N}_{\psi})=\mu(\mathcal{N}_{\varphi\vee\psi})=\mathbb{P}(\varphi\vee\psi)$

If $$\mathcal{N}_{\varphi}\cap \mathcal{N}_{\psi}=\emptyset$$, then $$\neg(\varphi\wedge\psi)$$ is logically valid, so, by completeness, $$T_0$$ proves $$\neg(\varphi\wedge\psi)$$. Thus, by Gaifman coherence, $\mathbb{P}(\varphi\vee\psi)=\mathbb{P}(\varphi)+\mathbb{P}(\psi).$

By the way we defined it, $$\mu$$ clearly satisfies the modified reflection principle we gave. $$\square$$

### NEW DISCUSSION POSTS

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes

 by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
 by Abram Demski on Predictable Exploration | 0 likes

Hm, I got the same result
 by Alex Appel on Predictable Exploration | 1 like

Paul - how widely do you want
 by David Krueger on Funding opportunity for AI alignment research | 0 likes

I agree, my intuition is that
 by Abram Demski on Smoking Lesion Steelman III: Revenge of the Tickle... | 0 likes