Waterfall Truth Predicates post by Abram Demski 800 days ago | Benja Fallenstein, Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 2 comments The waterfall-based approaches to the Löbian obstacle offer a way around finitely-terminating sequences of trust which we get by adding towers of soundness schemas to $$PA$$. This creates a kind of illusion of self-trust by way of a non-well-founded chain of trust. Another familiar situation where we are normally faced with the ability to construct arbitrarily high towers but not a single self-referential system is that of truth predicates. Tarski’s undefinability theorem blocks the existence of a full truth predicate within the same language as the one which it describes. Perhaps a similar waterfall construction can be applied, to get an infinite descending chain of languages. Extend the language of $$PA$$ with a family of truth predicates $$Tr_n$$. A Tarski-style approach would assert a T-schema $$Tr_n(\ulcorner \phi \urcorner) \leftrightarrow \phi$$ for $$\phi$$ which contain truth predicates indexed strictly lower than $$n$$. ($$\ulcorner \phi \urcorner$$ is the Gödel number of $$\phi$$.) Here, we wish to flip this, and assert a T-schema which allows strictly higher $$n$$. This brings to mind Yablo’s Paradox. A contradiction can likely be worked out in a way resembling that, but instead I’ll note that this theory implies the naive soundness waterfall in which we construct a sequence of theories $$T_{n} = T_{n+1} + Sound(T_{n+1})$$. This is because we can use the truth predicate $$Tr_n$$ to carry out a proof of soundness for the axioms and inference rules, with the exception of instances of the T-schema involving $$Tr_{m \leq n}$$. This gives us the naive soundness waterfall, which we know to be inconsistent. (Note that I have not checked this in detail, however.) My idea for fixing the T-schema, then, is to introduce the same $$\psi(n)$$ predicate which asserts that $$n$$ is not the Gödel number of a proof of contradiction in $$ZFC$$. We make the new schema: $$\psi(n) \rightarrow \big[ Tr_n(\ulcorner \phi \urcorner) \leftrightarrow \phi \big]$$, where $$\phi$$ contains only $$Tr_{m>n}$$. Because we can prove any particular $$\psi(n)$$, we can still apply the schema in specific cases. It seems likely that we can still carry out soundness arguments, as well, constructing the consistent version of the soundness waterfall. If so, the theory ends up being unsound as a result. Here’s a proof that the theory is unsound. Consider again Yablo’s paradox. We construct an infinite sequence of statements, $$A_0, A_1, ...$$ each of which assert that the subsequent statements in the sequence are all false. Specifically: $A_n \leftrightarrow \forall_{m>n}: \neg Tr_n(\ulcorner A_m \urcorner)$ Considering any particular $$A_n$$, we see that it implies $$\neg Tr_n(\ulcorner A_{n+1} \urcorner)$$, and also $$\forall_{m>n+1}: \neg Tr_n(\ulcorner A_m \urcorner)$$. By an application of the T-schema, however, these two statements are just the negation of each other. Therefore, the theory proves $$\neg A_n$$. The choice of $$n$$ was generic, so we can see that the system eventually proves every sentence in the sequence false. From outside, we can see that this implies that each of them is true, however. If the system proposed here can also carry out that reasoning, then it will be inconsistent. Even if it is consistent, it’s still unsound, so it’s unlikely to be very useful. It would be interesting if truth-predicate versions of the other solutions to the Löbian obstacle could be constructed. (My intuition is that this won’t be possible for the consistency waterfall, but is likely possible for model polymorphism.)

 by Benja Fallenstein 798 days ago | Patrick LaVictoire likes this | link I would suggest changing this system by defining $$\psi(n)$$ to mean that no $$m\le n$$ is the Gödel number of a proof of an inconsistency in ZFC (instead of just asserting that $$n$$ isn’t). The purpose of this is to make it so that if ZFC were inconsistent, then we only end up talking about a finite number of levels of truth predicate. More specifically, I’d define $$T_n$$ to be PA plus the axiom schema $\forall m\ge n.\;\psi(m)\to\forall x.\;\mathrm{Tr}_m(\ulcorner\varphi(\overline x)\urcorner)\leftrightarrow\varphi(x).$ Then, it seems that Jacob Hilton’s proof that the waterfalls are consistent goes through for this waterfall: Work in ZFC and assume that ZFC is inconsistent. Let $$n$$ be the lowest Gödel number of a proof of an inconsistency. Let $$M$$ be the following model of our language: Start with the standard model of PA; it remains to give interpretations of the truth predicates. If $$m\ge n$$, then $$\mathrm{Tr}_m(k)$$ is false for all $$k$$. If $$mm$$. Then, it’s clear that $$T_0$$, and hence all $$T_m$$ (since $$T_0$$ is the strongest of the systems) is sound on $$M$$, and therefore consistent. Thus, we have proven in ZFC that if ZFC is inconsistent, then $$T_0$$ is consistent; or equivalently, that if $$T_0$$ is inconsistent, then ZFC is consistent. Stepping out of ZFC, we can see that if $$T_0$$ is inconsistent, then ZFC proves this, and therefore in this case ZFC proves its own consistency, implying that it is inconsistent. Hence, if ZFC is consistent, then so is $$T_0$$. (Moreover, we can formalize this reasoning in ZFC. Hence, we can prove in ZFC (i) that if ZFC is inconsistent, then $$T_0$$ is consistent, and (ii) that if ZFC is consistent, then $$T_0$$ is consistent. By the law of the excluded middle, ZFC proves that $$T_0$$ is consistent.) reply
 by Benja Fallenstein 798 days ago | link We should be more careful, though, about what we mean by saying that $$\varphi(x)$$ only depends on $$\mathrm{Tr}_{m}$$ for $$m>n$$, though, since this cannot be a purely syntactic criterion if we allow quantification over the subscript (as I did here). I’m pretty sure that something can be worked out, but I’ll leave it for the moment. reply

### NEW DISCUSSION POSTS

Note that the problem with
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
 by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
 by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes