Intelligent Agent Foundations Forumsign up / log in
Questioning GLS-Coherence
discussion post by Abram Demski 1130 days ago | discuss

At the end of my trolling post, I said that naturalistic logical updates stand or fall with the strength of the reflective consistency argument for them, and that this has not been sufficiently articulated yet. Indeed, the choice of \(GLS\)-coherence as a model of bounded rationality is highly questionable. While previous posts have discussed the motivation a fair amount, I was writing with the hopes of providing a solution, so I added certain assumptions that I thought might be sufficient, if not necessary. The trolling theorem, among other concerns, casts doubt on those assumptions. Here, I describe where I think I was wrong.

Are the updates naturalistic?

My argument for using \(GLS\)-coherence was that the agent, planning ahead for the possibility that \(x\) is proved, should be coherent with respect to new propositional constraints which \(x\) contains. \(GLS\) formalizes this as \(\Box x \rightarrow x\), which allows us to conclude that a \(GLS\)-coherent prior is coherent with respect to \(x\) when updated on \(\Box x\). Coherence updating on \(\Box x\) isn’t really what my argument asked for, though: I asked for coherence conditioning on future-self having access to a proof of \(x\). A \(GLS\)-coherent prior isn’t good enough to fill in this inferential gap on its own. It can imagine future-self having access to a proof of \(x\) without \(\Box x\) being true.

Hence, my updates were not really “naturalistic”; they were not able to look at the imagined future world and extract \(\Box x\). In order for the UDT to make plans which depend on whether \(\Box x\) holds, it would give the plans direct access to the theorem prover in a non-naturalistic way (it would trust its own theorem prover in a special way which wouldn’t automatically extend to other theorem provers in the environment).

I think this objection isn’t totally fatal to \(GLS\)-coherence; it means there has to be special attention to the “pipe” which connects \(\Box x\) with actually-having-proven-\(x\) in the overall decision theory. However, it does suggest that we might be able to accomplish the job easier without \(GLS\)-coherence, only using this “pipe” between proof and belief.

Bayes-Endorsement \(\neq\) Planning-Endorsement

The two-update problem is the observation that in systems of logical probabilities so far, there seems to be a non-Bayesian “second update” which adjusts beliefs based on proofs. (Let’s call this the “logical update”.) I’ve argued in previous posts that there is a strong connection between reflective consistency and Bayesian updates, which led me to attempt to pack a sufficiently good logical update into a Bayesian update on proofs. The trolling theorem shows, at best, severe difficulties making this work. My proposal of \(GLS\)-coherence solves the two-update problem, but abandons essentially every other good property which we’ve been aiming for in logical updates. I now see that this was based on faulty assumptions about the importance of the Bayesian update.

I’ve been failing to distinguish sufficiently well between two different notions of “endorsement”. My argument for \(GLS\)-coherence, as stated in the previous section, implicitly relies on a connection between plan endorsement and Bayesian endorsement. Plan-endorsement means estimating high utility for plans optimized according to future-self’s beliefs. Bayes-endorsement means that future-self’s beliefs match current self’s conditional probabilities on the evidence future-self would see to change its beliefs (in this case, proofs). Intuitively (for me at least), there seemed to be an almost trivial relationship between these two: if your conditional probabilities differ from future-self’s beliefs on the same evidence, there should be some scenarios in which these belief differences cause you to endorse different plans, right? This connection is not as strong as it appears.

Let’s get a bit more formal. Let \(P_{\{ b_n \} }\) be the possibly-not-Bayes update of \(P\) on some evidence \(B=b_n\). Also let \(P_{new}=f\) be an event representing that beliefs have become some function \(f\) based on the update.

Bayes-endorsement: \(\forall \phi, P_{\{ b_n\}}(\phi) = P(\phi|B=b_n)\).

Plan-endorsement: \(\arg\max_{plan} \mathbb{E}_{P_{\{ b_n \} }} [U(outcome)|plan] = \arg\max_{plan} \mathbb{E}_p [U(outcome) | plan \wedge P_{new}=P_{\{ b_n \} }]\)

These are somewhat difficult to compare, but the following implies plan-endorsement:

Strong plan-endorsement: \(\forall \phi, P(\phi | P_{new} = f \wedge f(\phi)=p)=p\)

This says that current-self would defer to future-self’s probability estimates, if it knew what they were. This implies plan-endorsement, because if current-self imagines that future-self has a specific belief distribution, then it evaluates plans in that scenario with exactly that belief distribution; so, of course it agrees with future-self about which plans are good.

But, even strong plan-endorsement is weaker than Bayes-endorsement. Strong plan-endorsement implies Bayes-endorsement under the condition that \(P(P_{new}=P_{\{ b_n \} }|B=b_n)=1\). If \(P\) knows exactly how future-self reacts to evidence, and it trusts that reaction, then its conditional probabilities must embody that reaction. However, in situations of logical uncertainty such as $GLS-coherence was attempting to address, this need not be the case. Future-self’s computations in response to observing proofs can be more complicated than current-self can predict.

As in the previous section, this makes the problem appear easier rather than harder. Bayes-endorsement seemed quite problematic, but it turns out we don’t need it. Perhaps logically ignorant UDT can use a really simplistic prior which has almost no structure except strong plan-endorsement of a really good logical update.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms