Forum Digest: Updateless Decision Theory
post by Patrick LaVictoire 1066 days ago | Abram Demski, Benja Fallenstein, Jessica Taylor, Luke Muehlhauser and Nate Soares like this | discuss

Summary: This is a quick expository recap, with links, of the posts on this forum on the topic of updateless decision theory, through 3/19/15. Read this if you want to learn more about UDT, or if you’re curious about what we’ve been working on lately!

# Introduction

Updateless decision theory (henceforth UDT) is a proposed algorithm for making good decisions in a world that can contain predictions and other echoes of the agent’s decision algorithm. It is motivated by a set of Newcomblike problems on which the classical decision theories (evidential decision theory and causal decision theory) have known failure modes. For more context and motivation for UDT, see the MIRI research document Toward Idealized Decision Theory.

Philosophically speaking, UDT looks over all possible strategies (maps from “agent observes X” to “agent does Y”), picks the global strategy that would be best for all copies of UDT in aggregate, and then performs the action that the strategy recommends for an agent in its position. (One explanation for choosing strategies rather than actions is that it defends against blackmail: if you are being blackmailed, then paying up appears to be the better action; but since the strategy of refusing to pay deters all savvy blackmailers, UDT knowably refuses to pay up and thus is rarely blackmailed. Or so the theory goes.)

This leads to some interesting ambiguities, most importantly the counterfactuals about “what would happen” if the deterministic UDT algorithm chose strategy B, when in fact it will end up choosing strategy A. These logical counterfactuals can be confronted straightforwardly in mathematical models by trying to formally prove that if UDT selects strategy S, then the universe is in a state of utility U. But then one faces the problem of spurious counterfactuals (“if the agent chose X then Y would happen”, which is only logically valid because we can prove that the agent does not choose X), which lead to further complications.

The Gödel-Löb modal logic GL (also called provability logic) allows us to construct a good heuristic model of UDT in a mathematical universe, using Gödel numbering and quining to capture self-reference, and evaluating counterfactuals via unbounded proof searches (i.e. a halting oracle). This “modal UDT” has some interesting properties: for instance, it is provably optimal on decision problems that “only care what it does”, if given sufficiently powerful reasoning axioms. Moreover, the tools of Kripke semantics allow the actions of modal UDT and other modally defined agents to be formally verified in polynomial time!

Lest we get carried away, modal UDT is neither a perfect instantiation of the philosophical UDT algorithm, nor is it optimal in problems involving universes that predict the agent’s decisions via provability. Even so, it is an incredibly fruitful object for further study in decision theory!

Without further ado, here is an overview of all the posts on this forum thus far that relate to updateless decision theory:

# New research on modal UDT

• Using modal fixed points to formalize logical causality, Vladimir Slepnev. I believe this is actually the first time that someone constructed a UDT algorithm in provability logic!

• “Evil” decision problems in provability logic, Benja Fallenstein. For every modal decision theory, you can set up a fair decision problem (i.e. one that simply maps actions to outcomes rather than depending on the inner details of how the agent deliberates) which punishes the agent, such that other decision theories can get a better outcome. This is an important limitation on any optimality results, but as we shall see, it is not the final word.

• An optimality result for modal UDT, Benja Fallenstein. For each fixed fair decision problem, there is a set of axioms (in this case, simply the true axioms about the mapping from actions to outcomes) such that modal UDT equipped with those axioms will achieve the best possible outcome.

• Improving the modal UDT optimality result, Benja Fallenstein. We can use more general axioms than the ones in the previous post, and still achieve the best possible outcome.

• Obstacle to modal optimality when you’re being modalized, Patrick LaVictoire. The optimality result cannot carry over to even simple decision problems where the outcome depends on whether the agent provably takes a certain action.

• On notation for modal UDT, Benja Fallenstein. This emphasizes that our use of modal agents and decision theories relies on fixed points of modal logic operators, and thus we should be careful about using functional notation.

• An implementation of modal UDT, Benja Fallenstein. The polynomial-time algorithm for figuring out what a modal UDT agent does in a decision problem, now in a simple Haskell program!

# New research on logical counterfactuals

• Uniqueness of UDT for transparent universes, Tsvi Benson-Tilson. This is a different abstract model of UDT, using what we call “playing chicken with the universe”: before reasoning about consequences, take each possible action and attempt to prove you don’t take it; if you ever succeed, immediately take that action. This is more useful than it sounds, because it rules out certain spurious counterfactuals.

• A different angle on UDT, Nate Soares. An exploration of spurious counterfactuals, with an example (where UDT is used to analyze another agent, rather than to take an action itself).

• Why conditioning on “the agent takes action a” isn’t enough, Nate Soares. Continuing the exploration of spurious counterfactuals, with probabilistic agents.

• Third-person counterfactuals, Benja Fallenstein. There is at least some set of assumptions (a “sufficiently informative” decision problem) for which modal UDT analyzes an algorithm (perhaps itself) adequately and avoids acting on spurious counterfactuals… or does it?

• The odd counterfactuals of playing chicken, Benja Fallenstein. Nope, modal UDT is perfectly susceptible to spurious counterfactuals when analyzing other algorithms, even if the decision problem is sufficiently informative.

# Other new research relevant to UDT

### NEW DISCUSSION POSTS

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
 by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
 by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
 by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
 by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
 by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
 by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
 by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
 by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
 by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
 by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes