Waterfall Truth Predicates post by Abram Demski 862 days ago | Benja Fallenstein, Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 2 comments The waterfall-based approaches to the Löbian obstacle offer a way around finitely-terminating sequences of trust which we get by adding towers of soundness schemas to $$PA$$. This creates a kind of illusion of self-trust by way of a non-well-founded chain of trust. Another familiar situation where we are normally faced with the ability to construct arbitrarily high towers but not a single self-referential system is that of truth predicates. Tarski’s undefinability theorem blocks the existence of a full truth predicate within the same language as the one which it describes. Perhaps a similar waterfall construction can be applied, to get an infinite descending chain of languages. Extend the language of $$PA$$ with a family of truth predicates $$Tr_n$$. A Tarski-style approach would assert a T-schema $$Tr_n(\ulcorner \phi \urcorner) \leftrightarrow \phi$$ for $$\phi$$ which contain truth predicates indexed strictly lower than $$n$$. ($$\ulcorner \phi \urcorner$$ is the Gödel number of $$\phi$$.) Here, we wish to flip this, and assert a T-schema which allows strictly higher $$n$$. This brings to mind Yablo’s Paradox. A contradiction can likely be worked out in a way resembling that, but instead I’ll note that this theory implies the naive soundness waterfall in which we construct a sequence of theories $$T_{n} = T_{n+1} + Sound(T_{n+1})$$. This is because we can use the truth predicate $$Tr_n$$ to carry out a proof of soundness for the axioms and inference rules, with the exception of instances of the T-schema involving $$Tr_{m \leq n}$$. This gives us the naive soundness waterfall, which we know to be inconsistent. (Note that I have not checked this in detail, however.) My idea for fixing the T-schema, then, is to introduce the same $$\psi(n)$$ predicate which asserts that $$n$$ is not the Gödel number of a proof of contradiction in $$ZFC$$. We make the new schema: $$\psi(n) \rightarrow \big[ Tr_n(\ulcorner \phi \urcorner) \leftrightarrow \phi \big]$$, where $$\phi$$ contains only $$Tr_{m>n}$$. Because we can prove any particular $$\psi(n)$$, we can still apply the schema in specific cases. It seems likely that we can still carry out soundness arguments, as well, constructing the consistent version of the soundness waterfall. If so, the theory ends up being unsound as a result. Here’s a proof that the theory is unsound. Consider again Yablo’s paradox. We construct an infinite sequence of statements, $$A_0, A_1, ...$$ each of which assert that the subsequent statements in the sequence are all false. Specifically: $A_n \leftrightarrow \forall_{m>n}: \neg Tr_n(\ulcorner A_m \urcorner)$ Considering any particular $$A_n$$, we see that it implies $$\neg Tr_n(\ulcorner A_{n+1} \urcorner)$$, and also $$\forall_{m>n+1}: \neg Tr_n(\ulcorner A_m \urcorner)$$. By an application of the T-schema, however, these two statements are just the negation of each other. Therefore, the theory proves $$\neg A_n$$. The choice of $$n$$ was generic, so we can see that the system eventually proves every sentence in the sequence false. From outside, we can see that this implies that each of them is true, however. If the system proposed here can also carry out that reasoning, then it will be inconsistent. Even if it is consistent, it’s still unsound, so it’s unlikely to be very useful. It would be interesting if truth-predicate versions of the other solutions to the Löbian obstacle could be constructed. (My intuition is that this won’t be possible for the consistency waterfall, but is likely possible for model polymorphism.)

 by Benja Fallenstein 861 days ago | Patrick LaVictoire likes this | link I would suggest changing this system by defining $$\psi(n)$$ to mean that no $$m\le n$$ is the Gödel number of a proof of an inconsistency in ZFC (instead of just asserting that $$n$$ isn’t). The purpose of this is to make it so that if ZFC were inconsistent, then we only end up talking about a finite number of levels of truth predicate. More specifically, I’d define $$T_n$$ to be PA plus the axiom schema $\forall m\ge n.\;\psi(m)\to\forall x.\;\mathrm{Tr}_m(\ulcorner\varphi(\overline x)\urcorner)\leftrightarrow\varphi(x).$ Then, it seems that Jacob Hilton’s proof that the waterfalls are consistent goes through for this waterfall: Work in ZFC and assume that ZFC is inconsistent. Let $$n$$ be the lowest Gödel number of a proof of an inconsistency. Let $$M$$ be the following model of our language: Start with the standard model of PA; it remains to give interpretations of the truth predicates. If $$m\ge n$$, then $$\mathrm{Tr}_m(k)$$ is false for all $$k$$. If $$mm$$. Then, it’s clear that $$T_0$$, and hence all $$T_m$$ (since $$T_0$$ is the strongest of the systems) is sound on $$M$$, and therefore consistent. Thus, we have proven in ZFC that if ZFC is inconsistent, then $$T_0$$ is consistent; or equivalently, that if $$T_0$$ is inconsistent, then ZFC is consistent. Stepping out of ZFC, we can see that if $$T_0$$ is inconsistent, then ZFC proves this, and therefore in this case ZFC proves its own consistency, implying that it is inconsistent. Hence, if ZFC is consistent, then so is $$T_0$$. (Moreover, we can formalize this reasoning in ZFC. Hence, we can prove in ZFC (i) that if ZFC is inconsistent, then $$T_0$$ is consistent, and (ii) that if ZFC is consistent, then $$T_0$$ is consistent. By the law of the excluded middle, ZFC proves that $$T_0$$ is consistent.) reply
 by Benja Fallenstein 861 days ago | link We should be more careful, though, about what we mean by saying that $$\varphi(x)$$ only depends on $$\mathrm{Tr}_{m}$$ for $$m>n$$, though, since this cannot be a purely syntactic criterion if we allow quantification over the subscript (as I did here). I’m pretty sure that something can be worked out, but I’ll leave it for the moment. reply

### NEW DISCUSSION POSTS

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes

 by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
 by Abram Demski on Predictable Exploration | 0 likes

Hm, I got the same result
 by Alex Appel on Predictable Exploration | 1 like

Paul - how widely do you want
 by David Krueger on Funding opportunity for AI alignment research | 0 likes

I agree, my intuition is that
 by Abram Demski on Smoking Lesion Steelman III: Revenge of the Tickle... | 0 likes