Updatelessness and Son of X post by Scott Garrabrant 385 days ago | Ryan Carey, Abram Demski and Jessica Taylor like this | 8 comments The purpose of this post is to discuss the relationship between the concepts of Updatelessness and the “Son of” operator. Making a decision theory that is reflectively stable is hard. Most agents would self-modify into a agent if given the chance. For example if a CDT agent knows that it is going to be put in a Newcomb’s problem, it would precommit to one-box, “causing” Omega to predict that it one-boxes. We say “son of CDT” to refer to the agent that a CDT agent would self-modify into, and more generally “son of X” to refer to the agent that agent X would self-modify into. Thinking about these “son of” agents is unsatisfying for a couple reasons. First, it is very opaque. There is an extra level of indirection, where you cant just directly reason about what agent X will do. Instead have to reason about what agent X will modify into, which gives you a new agent, which you probably understand much less than you understand agent X, and then you have to reason about what that new agent will do. Second, it is unmotivated. If you had a good reason to like Son of X, you would probably not be calling in Son of X. Important concepts get short names, and you probably don’t have as many philosophical reasons to like Son of X as you have to like X. Wei Dai’s Updateless Decision Theory is perhaps our current best decision theory proposal. A UDT agent chooses a function from its possible observations to its actions, without taking into account its observations, and then applies that function. The main problem with this proposal is in formalizing it in a logical uncertainty framework. Some of the observations that an agent makes are going to be logical observations, for example, an agent may observe the millionth digit of $$\pi$$. Then it is not clear how an agent can not take the digit of $$\pi$$ into account in its calculation of the best policy. If we do not tell it the digit through the standard channel, it might still compute the digit while computing the best policy. As I said here, it is important to note logical updatelessness is about computations and complexity, not about what logical system you are in. So what would true logical updatelessness look like? Well the agent would have to not update on computations. Since it is a computation itself, we cannot keep it independent from all computations, but we can restrict it to some small class of computations. The way we do this is by giving the updateless part of the decision theory finite computational bounds. Computational facts not computable within those bounds are still observed, but we do not take them into account when choosing a policy. Instead, we use our limited computation to choose a policy in the form of a function from how the more difficult computations turn out to actions. The standard way to express a policy is a bunch of input/output pairs. However, since the inputs here are results of computations, this can equivalently be expressed by a single computation that gives an output. (To see the equivalence, note that we can write down a single computation which computes all the inputs and produces the corresponding output. Conversely, given a single computation, we can just supply the identity function of the output of that computation.) Thus, logical updatelessness consists of a severely resource bounded agent choosing what policy (In the form of a computation) it wants to run given more resources. Under this model, it seems that whenever you have an agent collecting more computational resources over time, with the ability to rewrite itself, you get an updateless agent. The limited agent is choosing using its bounded resources what algorithm it wants to run to choose its output when it has collected more computational resources. The future version of the agent with more resources is the updateless version of the original agent, in that it is following the policy specified by the original agent before updating on all the computational facts. However, this is also exactly what we mean when we say that the later agent is the son of the bounded agent. There is still a free parameter in Logical Updatelessness, which is what decision procedure the limited version uses to select its policy. This is also underspecified in standard UDT, but I believe it is often taken to be EDT. Thus, we have logically updateless versions of many decision policies, which I claim is actually pointing at the same thing as Son on those various policies (in an environment where computational resources are collected over time).

 by Wei Dai 383 days ago | Ryan Carey, Abram Demski and Scott Garrabrant like this | link This does seem to be the “obvious” next step in the UDT approach. I proposed something similar as “UDT2” in a 2011 post to the “decision theory workshop” mailing list, and others have made similar proposals. But there is a problem with having to choose how much time/computing resources to give to the initial decision process. If you give it too little then its logical probabilities might be very noisy and you could end up with a terrible decision, but if you give it too much then it could update on too many logical facts and lose on acausal bargaining problems. With multiple AI builders, UDT2 seems to imply a costly arms-race situation where each has an incentive to give their initial decision process less time (than would otherwise be optimal) so that their AI could commit faster (and hopefully be logically updated upon by other AIs) and also avoid updating on other AI’s commitments. I’d like to avoid this but don’t know how. I’m also sympathetic to Nesov’s (and others such as Gary Drescher’s) sentiment that maybe there is a better approach to the problems that UDT is trying to solve, but I don’t know what that is either. reply
 by Scott Garrabrant 383 days ago | link So my plan is to “solve” the problem of choosing how much time to give it by having a parameter (which stage of a logical inductor to use), and trying to get results saying that if we set the parameter sufficiently high, and we only consider the output on sufficiently far out problems, then we can prove that it does well. It does not solve the problem, but it might let us analyze what we would get if we did solve the problem. reply
 by Wei Dai 383 days ago | Ryan Carey likes this | link But we know that cooperation is possible in much greater generality, even between unrelated agents It seems to me like cooperation might be possible in much greater generality. I don’t see how we know that it is possible. Please explain? Each one of these points is relatively straightforward to address, but not together. I’m having trouble following you here. Can you explain more about each point, and how they can be addresses separately? reply
 by Vadim Kosoy 384 days ago | link This is more or less what I was talking about here (see last paragraph). This should also give us superrationality, provided that instead of allowing an arbitrary “future version”, we constrain the future version to be a limited agent with access to a powerful “oracle” for queries of the form $$E[U \mid \pi]$$ for all possible policies $$\pi$$ (which might involve constructing another, even more powerful, agent). If we don’t impose this constraint, we run into the problem of “self-stabilizing mutually detrimental blackmail” in multi-agent scenarios. reply
 by Wei Dai 383 days ago | link I may be misunderstanding what you’re proposing, but assuming that each decision process has the option to output “I’ve thought enough, no need for another version of me, it’s time to take action X” and have X be “construct this other agent and transfer my resources to it”, the constraint on future versions doesn’t seem to actually do much. reply
 by Vadim Kosoy 380 days ago | link Well, the time to take a decision is limited. I guess that for this to work in full generality we would need that the total computing time of the future agents over a time discount horizon will be insufficient to simulate the “oracle” of even the first agent, which might be a too harsh restriction. Perhaps restricting space will help since space aggregates as max rather than as sum. I don’t have a detailed understanding of this, but IMO any decision theory that yields robust superrationality (i.e. not only for symmetric games and perfectly identical agents) needs to have some aspect that behaves like this. reply

### NEW DISCUSSION POSTS

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes