Intelligent Agent Foundations Forumsign up / log in
A failed attempt at Updatelessness using Universal Inductors
discussion post by Scott Garrabrant 719 days ago | Jessica Taylor and Patrick LaVictoire like this | 1 comment

Here, I present a failed attempt to build an updateless decision theory out of universal inductors. It fails because it is is mistaking updatelessness about which logical theory it is in for true logical updatelessness about computations. I will use Tsvi’s notation.

Fix a UGI, \((\mathbb{P}_n)\). Fix a sequence of utility functions \((U_n:2^\omega\rightarrow \mathbb{R})\), which assigns a utility to all propositionally consistent worlds (represented by infinite bit strings). We assume that \(U_n(W)\) is computable function of \(n\) and the first \(k(n)\) bits for some computable function \(k\). In the simplest example, \(U_n\) is just equal to a single bit in the string.

We define a sequence of agents \((A_n)\) which output a single bit \(1\) or \(0\). These agents will be broken into two pieces, a deductive process, which outputs a bunch of logical facts, and a decision process, which chooses a policy in the form of a function from the possible outputs of the deductive process to \(\{0,1\}\).

Let \(P_n\) denote the set of policies that the decision process can output. There is a computable partition of worlds into sets of worlds where each policy is output, \(S_n:2^\omega\rightarrow P_n\). For each \(p\in P_n\), we can compute the expectation, \(\mathbb{E}(U_n(W)|S_n(W)=p)\), where W is sampled according to \(\mathbb{P}_n\). The decision process outputs the policy \(p\) which maximizes \(\mathbb{E}(U_n(W)|S_n(W)=p)\), and the agent \(A_n\) outputs the result of applying that policy to the output of the deductive process.

There are actually many things wrong with the above proposal, and many similar proposals that fail in similar or different ways. However, I want to focus on the one problem that proposals like this have in common:

Universal Induction is a model for uncertainty about what theory/model you are in; it is not a model for uncertainty about the output of computations.

It is easiest to see why this is a problem using the counterfactual mugging problem. We would like to use a universal inductor to be uncertain about a digit of \(\pi\), and thus reason about the world in which it went another way. The problem is that a powerful universal inductor has the digits of \(\pi\) in its probabilities, even if it does not know that it is in PA. This is because the Kolomorogov complexity of a infinite string of the digits of \(\pi\) is very low, while the Kolomorogov complexity of a string that looks like \(\pi\) for a very long time, and then changes is high. We do not have to direct our UGI at PA for it to have good beliefs about late bits in a string that starts out looking like \(\pi\).

I will use the phrase “logical updatelessness” to refer updatelessness about computations. I think updatelessness about the logical system is mostly a distraction from the more important concept of logical updatelessness. (Similarly, I believe that early work in logical uncertainty about distributions over complete theories was mostly a distraction from the later work that focused on uncertainty about computations.)



by Paul Christiano 717 days ago | link

From my perspective, the point of reasoning about complete theories isn’t that we actually care about them, it’s that “what does this halting oracle output?” might be a useful analogy for “what does this long-running computation output?” I still think it is/was a useful analogy, though the time eventually came to move on to smaller and better things.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms