An Approach to Logically Updateless Decisions
discussion post by Abram Demski 209 days ago | Sam Eisenstat, Jack Gallagher and Scott Garrabrant like this | 4 comments

Scott describes two obstacles for logical inductor decision theory: reasoning sanely about untaken actions, and getting a working notion of logical updatelessness. Here, I consider the question: if we could solve the first problem, could we solve the second? I provide a somewhat plausible substitute for updatelessness which relies on taking a logical counterfactual.

## Motivation

Suppose we have a notion of logical counterfactual, $$\Box \mkern-7mu \rightarrow$$, represented by a hypothetical axiomatic system $$LC$$ which is also rich enough to talk about computations. We trust $$LC$$; specifically, if it is a theorem of $$LC$$ that an agent would have achieved a particular utility had it followed an alternative strategy, we put as much or more faith in this than our own intuition about what could have been. The 5-and-10 problem is solved by decree: if an agent reasoning with $$LC$$ thinks taking the $10 would be worse, then thinks so because it’s true. I take it that this does not solve problems which involve updateless-like reasoning about logical uncertainty. In counterfactual mugging with a logical coin, the agent can see that counterfactually giving Omega$100 doesn’t get it anything in the real mathematical universe; it doesn’t seem like the correct hypothetical would be anything like “If I give Omega $100 having seen that the 1000th digit of pi is even, then the digit might turn out to be odd rather than even, so I might get$10000.”. This is the realm of updateless reasoning rather than counterfactuals; but, we have no satisfactory account of logical updatelessness.

So, we can’t directly say that it would be better to give Omega $100. However, in a sequence of scenarios using consecutive digits of pi, we can see that the strategy which always gives the$100 when Omega asks for it will do better in the average case than the strategy which refuses; half the time, we get $10K, if we’re the sort of person who gives up$100 the other half of the time.

## Average DT

I’ll represent a decision problem $$U$$ as a computation which takes no arguments and yields a rational number; The evaluation of $$U$$ will be written $$\texttt{Eval}[U]$$. A sequence of decision problems, $$U_n$$, can be defined by a function from natural numbers to computations. The average utility up to $$n$$ is $$\bar U_n = \frac{1}{n} \sum_{i=1}^n \texttt{Eval}[U_i]$$. An agent $$A(n)$$ will be a function of $$n$$. We’ll also consider the utility $$\bar U_n^{B}$$ which might have been achieved had agent $$A$$ acted like agent $$B$$, which is the value $$u$$ such that $$A := B \; \Box \mkern-7mu \rightarrow \bar U_n = u$$. ($$:=$$ is supposed to be a directional assignment from the language of $$LC$$, which says that $$A(n)$$ outputs what $$B(n)$$ normally would for all $$n$$. Hopefully $$LC$$ can define such a concept.) I make the assumption that $$\bar U_n^{B}$$ is computable.

Definition: Average DT. Take a sequence of decision problems $$U_n$$, an accuracy $$\epsilon > 0$$, and a finite set of alternative strategies $$B_m(n)$$. The average-DT agent $$A(n)$$ computes the strategy which $$A(n-1)$$ selected; call it $$B_x$$. It then computes $$\bar U_{n-1}^{B_m}$$ for all $$B_m$$. If $$\bar U_{n-1}^{B_y} > \bar U_{n-1}^{B_x} + \epsilon$$ for any $$B_y$$, then $$A(n)$$ selects $$B_y$$ to determine the output of this round; $$A(n) = B_y(n)$$. Otherwise, $$A(n) = B_x(n)$$.

The idea is that a good way to achieve high utility for large $$n$$ is to choose outputs according to the strategy which does best on average in the limit. We can eventually do that if we keep switching to whatever strategy currently does best on average (but with $$\epsilon$$ “stickiness” to prevent us from oscillating strategies forever). This only works under some conditions, though.

Definition: Independent. A sequence of decision problems $$U_n$$ is independent if $$m \neq n$$ implies $$U_n=u \rightarrow ( A(m) := a \; \; \Box \mkern-7mu \rightarrow U_n=u)$$. That is, hypothetically changing the output of the agent on $$m$$ has no effect on the utility at $$n$$.

Definition: Tail-dependent. Call a pair of strategies $$B, C$$ convergent if there exists an $$n$$ such that for all $$m > n$$, $$B(m) = C(m)$$. A sequence of decision problems $$U_n$$ is tail-dependent if, for a convergent pair $$B,C$$, if $$\lim_{n \rightarrow \infty} \bar U_n^{B}$$ exists, then $$\lim_{n \rightarrow \infty} \bar U_n^{C}$$ also exists, and $$\lim_{n \rightarrow \infty} \bar U_n^{B} = \lim_{n \rightarrow \infty} \bar U_n^{C}$$.

Tail-dependence is a more general condition than independence. Whereas independence says that results at $$n$$ only depend on actions at $$n$$, tail-dependence allows $$\texttt{Eval}[U_n]$$ to depend on $$A(n-1)$$, $$A(n-2)$$, and so on, as long as the effects of any initial mistakes can eventually be washed out by following an optimal strategy in the limit. Independence is only defined here because I think of it as the more typical case. The subscript $$n$$ does not represent time; each instance is thought of as a decision problem of its own, and the aggregation of the utility which average-DT does is only in service of getting high utility on individual instances. If there is some dependence between instances, it is to be thought of as interaction with a separate version of oneself.

Theorem. For an average-decision-theory agent $$A$$, if $$U_n$$ is tail-dependent, and the limits $$\lim_{n \rightarrow \infty} \bar U_n^{B_m}$$ exist for all $$B_m$$ in the set of strategies, then the limit $$\lim_{n \rightarrow \infty} \bar U_n^{A}$$ exists, and furthermore $$\lim_{n \rightarrow \infty} \bar U_n^{A} \geq \lim_{n \rightarrow \infty} \bar U_n^{B_m} - \epsilon$$ for all $$B_m$$.

Proof. In the tail-dependent case, since the limits of the $$\bar U_n^{B_m}$$ exist, eventually $$\bar U_n^{B_m}$$ will vary by less than $$\epsilon / 2$$ for all $$m$$. At that point, the agent will never switch strategies again, because the running average for the strategy currently in use, $$B_x$$, will never fall more than $$\epsilon$$ below the running average for any other. This means $$A$$ is convergent with $$B_x$$. Since $$\lim_{n \rightarrow \infty} \bar U_n^{B_x}$$ exists, $$\lim_{n \rightarrow \infty} \bar U_n^{A}$$ does as well, and the two are equal. $$\Box$$

(If we knew that only one of the $$B_m$$ is optimal, we could allow $$\epsilon = 0$$, since the ordering of the running-average utility of strategies can only oscillate forever when the limits are the same.)

Note that the definition of \$A(n) computes $$\bar U_{n-1}^{B_m}$$ rather than $$\bar U_{n}^{B_m}$$. This was because, at least in the independent case, $$\bar U_n$$ will typically involve $$A(n)$$ itself; it seems potentially unrealistic to assume that this case can be computed by $$A(n)$$ (although the fact that we’re only evaluating it under the hypothetical case that we act like $$B_m$$ may make this feasible). It also costs us nothing, in terms of the optimality guarantee. However, it may be that even $$U_{n-1}$$ is too intractable for $$A(n)$$ to realistically evaluate. We can solve this in theory by using logical induction to approximate $$\bar U_{n}^{B_m}$$ for policy selection. No matter how slowly the $$\bar U_{n}^{B_m}$$ become computable, $$\lim_{n \rightarrow \infty} \bar U_{n}^{B_m} = \lim_{n \rightarrow \infty} \mathbb{E}_n [\bar U_{n}^{B_m}]$$, and the same optimality proof applies.

## Discussion

This is very similar in flavor to asymptotic decision theory. I’ll use the abbreviations AsDT and AvDT. Like AsDT, AvDT has an $$\epsilon$$ accuracy parameter,has a finite set of strategies which we choose from, treats the method of taking counterfactuals as given, deals with infinite sequences of decisions, and cares only about properties in the limit. Unlike AsDT, I choose the highest average reward over all rounds so far rather than the highest expected reward on this round. AsDT is also an epsilon-exploration approach, whereas AvDT relies on $$\Box \mkern-7mu \rightarrow$$ to give the correct expected utility alternate actions would have yielded.

These differences make AvDT behave in a more updateless manner than AsDT. In counterfactual mugging with a logical coin, AsDT always uses a logical inductor’s best-estimate of the utility it would get right now, so it sees the coin as already determined, and sees no benefit from giving Omega money in the cases where Omega asks for money. AvDT uses logical induction (or simple evaluation) to determine the average utility which a policy gets, so it sees giving Omega money as beneficial.

I doubt that an approach to learning $$\Box \mkern-7mu \rightarrow$$ analogous to AsDT’s optimistic learning of counterfactuals will work very well here, but that is mostly because I don’t think it works very well for AsDT in the first place. (It does not learn the counterfactual structure we would like it to when it plays prisoner’s dilemma against a copy of itself, for example.)

A big limitation of this approach is the need to represent decision problems as part of some infinite sequence. Different infinite sequences will yield different ‘optimal’ approaches for the same decision problem. For example, AvDT’s strategy for counterfactual mugging would be much different if $$U_n$$ were the subsequence of cases where the digit of pi is even. On the one hand, this is like saying that Omega only goes through the whole procedure in cases where it will make Omega money. But on the other hand, we’re facing the exact same situation in the world – it’s just the representation as part of a sequence of problems which has changed. It seems the sequence $$U_n$$ plays a role similar to the prior in UDT, telling us what spread of possibilities to prepare for.

Perhaps there are some conditions under which we can guarantee good asymptotic behavior for not only one sequence of decision problems, but many. If none of the sequences are subsequences of each other, then it seems possible to guarantee eventual optimality for all of them. However, that’s a very limiting condition, as illustrated by the counterfactual mugging example in the previous paragraph; we need to know which sequence to pick when one is a subsequence of the other.

But my uneasiness here may mainly be due to the fact that I am accustomed to an agent requiring an arbitrary prior, but unaccustomed to an agent requiring an arbitrary problem sequence. As with the problem of selecting a prior, it seems a good rule of thumb will be to pick the broader one, in absence of contrary information. I’m not holding my breath for a good notion of “universal problem sequence” here, but it would be nice if there could be such a concept.

The desirability of the approach will also depend on whether the dependence on $$\epsilon$$ and the restriction to finite sets of strategies can be eliminated. I’m skeptical it will be possible to get rid of either one.

Oh, and another problem: the behavior breaks down quickly outside the optimality conditions. If decision of even one early instance $$A(n)$$ matters forever, the agent will ignore this in its choices. This seems problematic.

Despite all these issues, my current intuition is that this approach has “the right flavor” in some sense: an optimality condition showing that we converge to the best strategy in the limit seems about as good a property as we can hope for in a decision theory compatible with logical uncertainty.

 by Sam Eisenstat 196 days ago | Jack Gallagher and Abram Demski like this | link In counterfactual mugging with a logical coin, AsDT always uses a logical inductor’s best-estimate of the utility it would get right now, so it sees the coin as already determined, and sees no benefit from giving Omega money in the cases where Omega asks for money. The way I would think about what’s going on is that if the coin is already known at the time that the expectations are evaluated, then the problem isn’t convergent in the sense of AsDT. The agent that pays up whenever asked has a constant action, but it doesn’t receive a constant expected utility. You can think of the averaging as introducing artificial logical uncertainty to make more things convergent, which is why it’s more updateless. (My understanding is that this is pretty close to how you’re thinking of it already.) reply
 by Abram Demski 188 days ago | Sam Eisenstat and Jack Gallagher like this | link I think AsDT has a limited notion of convergent problem. It can only handle situations where the optimal strategy is to make the same move each time. Tail-dependence opens this up, partly by looking at the limit of average payoff rather than the limit of raw payoff. This allows us to deal with problems where the optimal strategy is complicated (and even somewhat dependent on what’s done in earlier instances in the sequence). I wasn’t thinking of it as introducing artificial logical uncertainty, but I can see it that way. reply
 by Sam Eisenstat 187 days ago | link Yeah, I like tail dependence. There’s this question of whether for logical uncertainty we should think of it more as trying to “un-update” from a more logically informed perspective rather than trying to use some logical prior that exists at the beginning of time. Maybe you’ve heard such ideas from Scott? I’m not sure if that’s the right perspective, but it’s what I’m alluding to when I say you’re introducing artificial logical uncertainty. reply
 by Abram Demski 186 days ago | link I don’t think it’s much like un-updating. Un-updating takes a specific fact we’d like to pretend we don’t know. Plus, the idea there is to back up the inductor. Here I’m looking at average performance as estimated by the latest stage of the inductor. The artificial uncertainty is more like pretending you don’t know which problem in the sequence you’re at. reply

### NEW DISCUSSION POSTS

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
 by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Looking "at the very
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes

 by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
 by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes