Intelligent Agent Foundations Forumsign up / log in
Why conditioning on "the agent takes action a" isn't enough
post by Nate Soares 1129 days ago | Ryan Carey, Benja Fallenstein, Daniel Dewey, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | discuss

This post expands a bit on a point that I didn’t have enough space to make in the paper Toward Idealized Decision Theory.

Say we have a description of an agent program, and a description of a universe program \(\texttt{U()}\), and a set of actions \(A\), and a Bayesian probability distribution over propositions about the world. Say further that for each \(a \in A\) we can form the proposition “the agent takes action \(a\)”.

Part of the problem with EDT is that we can’t, in fact, use this to evaluate \(\mathbb{E}[\texttt{U()}|\text{the agent takes action }a]\). Why not? Because the probability that the agent takes action \(a\) may be zero (if the agent does not in fact take action \(a\)), and so evaluating the above might require conditioning on an event of probability zero.

There are two common reflexive responses: one is to modify the agent so that there is no action which will definitely not be taken (say, by adding code to the agent which iterates over each action, checks whether the probability of executing that action is zero, and then executes the action if it is definitely not going to be executed). The second response is to say “Yeah, but no Bayesian would be certain that an action won’t be taken, in reality. There’s always some chance of cosmic rays, and so on. So these events will never actually have probability zero.”

But while both of these objections work – in the sense that in most realistic universes, \(v_a := \mathbb{E}[\texttt{U()}|\text{the agent takes action }a]\) will be defined for all actions \(a\) – it does not fix the problem. You’ll be able to get a value \(v_a\) for each action \(a\), perhaps, but this value will not necessarily correspond to the utility that the agent would get if it did take that action.

Why not? Because conditioning on unlikely events can put you into very strange parts of the probability space.

Consider a universe where the agent first has to choose between a red box (worth $1) and a green box (worth $100), and then must decide whether or not to pay $1000 to meticulously go through its hardware and correct for bits flipped by cosmic rays. Say that this agent reasons according to EDT. It may be the case that this agent has extremely high probability mass on choosing “red” but nonzero mass on choosing “green” (because it might get hit by cosmic rays). But if it chooses green, it expects that it would notice that this only happens when it’s been hit by cosmic rays, and so would pay $1000 to get its hardware checked. That is, \(v_{\mathrm{red}}=1\) and \(v_{\mathrm{green}}=-900\).

What went wrong? In brief, “green” having nonzero probability does not imply that conditioning on “the agent takes the green box” is the same as the counterfactual assumption that the agent takes the green box. The conditional probability distribution may be very different from the unconditioned probability distribution (as in the example above, where conditioned on “the agent takes the green box”, the agent would expect that it had been hit by cosmic rays). More generally, conditioning the distribution on “the agent takes the green box” may introduce spurious correlations with explanations for the action (e.g., cosmic rays), and therefore \(v_a\) does not measure the counterfactual value that the agent would get if it did take the green box “of it’s own volition” / “for good reasons”.

Roughly speaking, evidential decision theory has us look at the probability distribution where the agent does in fact take a particular action, whereas (when doing decision theory) we want the probability distribution over what would happen if the agent did take the action. Forcing the event “the agent takes action \(a\)” to have positive probability does not make the former distribution look like the latter distribution: indeed, if the event has positive probability for strange reasons (cosmic rays, small probability that reality is a hallucination, or because you played chicken with your distribution) then it’s quite unlikely that the conditional distribution will look like the desired counterfactual distribution.

We don’t want to ask “tell me about the (potentially crazy) corner of the probability distribution where the agent actually does take action \(a\)”, we want to ask “tell me about the probability distribution that is as close as possible to the current world model, except imagining that the agent takes action \(a\).”

The latter thing is still vague and underspecified, of course; figuring out how to formalize it is pretty much our goal with studying decision theory.





NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

RSS

Privacy & Terms