Intelligent Agent Foundations Forumsign up / log in
Why conditioning on "the agent takes action a" isn't enough
post by Nate Soares 1254 days ago | Ryan Carey, Benja Fallenstein, Daniel Dewey, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | discuss

This post expands a bit on a point that I didn’t have enough space to make in the paper Toward Idealized Decision Theory.

Say we have a description of an agent program, and a description of a universe program \(\texttt{U()}\), and a set of actions \(A\), and a Bayesian probability distribution over propositions about the world. Say further that for each \(a \in A\) we can form the proposition “the agent takes action \(a\)”.

Part of the problem with EDT is that we can’t, in fact, use this to evaluate \(\mathbb{E}[\texttt{U()}|\text{the agent takes action }a]\). Why not? Because the probability that the agent takes action \(a\) may be zero (if the agent does not in fact take action \(a\)), and so evaluating the above might require conditioning on an event of probability zero.

There are two common reflexive responses: one is to modify the agent so that there is no action which will definitely not be taken (say, by adding code to the agent which iterates over each action, checks whether the probability of executing that action is zero, and then executes the action if it is definitely not going to be executed). The second response is to say “Yeah, but no Bayesian would be certain that an action won’t be taken, in reality. There’s always some chance of cosmic rays, and so on. So these events will never actually have probability zero.”

But while both of these objections work – in the sense that in most realistic universes, \(v_a := \mathbb{E}[\texttt{U()}|\text{the agent takes action }a]\) will be defined for all actions \(a\) – it does not fix the problem. You’ll be able to get a value \(v_a\) for each action \(a\), perhaps, but this value will not necessarily correspond to the utility that the agent would get if it did take that action.

Why not? Because conditioning on unlikely events can put you into very strange parts of the probability space.

Consider a universe where the agent first has to choose between a red box (worth $1) and a green box (worth $100), and then must decide whether or not to pay $1000 to meticulously go through its hardware and correct for bits flipped by cosmic rays. Say that this agent reasons according to EDT. It may be the case that this agent has extremely high probability mass on choosing “red” but nonzero mass on choosing “green” (because it might get hit by cosmic rays). But if it chooses green, it expects that it would notice that this only happens when it’s been hit by cosmic rays, and so would pay $1000 to get its hardware checked. That is, \(v_{\mathrm{red}}=1\) and \(v_{\mathrm{green}}=-900\).

What went wrong? In brief, “green” having nonzero probability does not imply that conditioning on “the agent takes the green box” is the same as the counterfactual assumption that the agent takes the green box. The conditional probability distribution may be very different from the unconditioned probability distribution (as in the example above, where conditioned on “the agent takes the green box”, the agent would expect that it had been hit by cosmic rays). More generally, conditioning the distribution on “the agent takes the green box” may introduce spurious correlations with explanations for the action (e.g., cosmic rays), and therefore \(v_a\) does not measure the counterfactual value that the agent would get if it did take the green box “of it’s own volition” / “for good reasons”.

Roughly speaking, evidential decision theory has us look at the probability distribution where the agent does in fact take a particular action, whereas (when doing decision theory) we want the probability distribution over what would happen if the agent did take the action. Forcing the event “the agent takes action \(a\)” to have positive probability does not make the former distribution look like the latter distribution: indeed, if the event has positive probability for strange reasons (cosmic rays, small probability that reality is a hallucination, or because you played chicken with your distribution) then it’s quite unlikely that the conditional distribution will look like the desired counterfactual distribution.

We don’t want to ask “tell me about the (potentially crazy) corner of the probability distribution where the agent actually does take action \(a\)”, we want to ask “tell me about the probability distribution that is as close as possible to the current world model, except imagining that the agent takes action \(a\).”

The latter thing is still vague and underspecified, of course; figuring out how to formalize it is pretty much our goal with studying decision theory.





I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes


Privacy & Terms