Intelligent Agent Foundations Forumsign up / log in
postCDT: Decision Theory using post-selected Bayes nets
post by Scott Garrabrant 468 days ago | Ryan Carey, Patrick LaVictoire and Paul Christiano like this | 1 comment

The purpose of this post is to document a minor idea about a new type of decision theory that works using a Bayes net. This is not a concrete proposal, since I will give no insight on which Bayes net to use. I am not that excited by this proposal, but think it is worth writing up anyway.

The main object used in this proposal has more information than a standard Bayes net. It is the result you get by taking a Bayes net and conditioning on the values of some of the nodes. The data type is a Bayes net, along with a subset of the nodes, and values for each of those nodes. I will call this a post-selected Bayes net.

First, and observation:

Given a collection of dependent random variables, one can sometimes infer a Bayes net relating them, for example by taking the network with the fewest parameters. Sometimes a network can be made simpler by adding in extra hidden variables, which are common causes. Sometimes it can be made simpler still by adding extra hidden variables which are common effects, and conditioning on the values of these common effects.

Philosophically, this is kind of weird. It is hard to imagine what would happen if we were trying to discover the causal structure of the universe, and we discover that there is some hidden variable that always turns out to be a certain way, in spite of appearing to be causally downstream from the other variables.

And now a decision theory, postCDT:

Given a post-selected Bayes net, and a special node representing my action, and a special node representing my utility, sever all edges directed into my action. For each possible action, compute the expectation of the value of the utility node in the newly severed network, given I take that action, and all the post-selected nodes have their specified value. Take the action resulting in the highest expected utility.

This can do some things that decision theories based on non-post-selected Bayes nets cannot:

Consider a twin prisoner’s dilemma. There is a node representing your output, a node representing your twin’s output, a node representing your utility, a node representing your twin’s utility, and possibly some other nodes correlating you with your twin. The two utilities are decedents of the two actions.

Observe that if you and your twin share the same non-post-selected diagram, then an least one of you must not be causally downstream of the other. Without loss of generality, assume that your twin is not causally downstream from you.

In this case, you do not effect your twins action, and CDT says you should defect. You could try to get past this by pretending that you are deciding for some abstract algorithm that is a parent of your action, but unless the two players are deciding for the exact same node, you will run into the same problem. It is more clear that pretending you are choosing for an ancestor node does not work in a prisoners dilemma with different agents that are mutually predicting each other.

Now consider a post-selected Bayes net, where we add an extra node, which is a child of both your and your twin’s actions. This node is (approximately) 1 if your actions are the same, and 0 otherwise. This node is post-selected to be 1, explaining the correlation between you and your twin. In this case, postCDT will recommend cooperation, since your cooperation is correlated with your twin’s cooperation, even after severing your parents (which did not exist in this simplified example.)

A weird bug (or feature):

Note that if you connect up your action to logically equivalent nodes through common post-selected effects, then when you sever the parents, you do not also sever the parents of the other logically equivalent nodes. This seems weird, and makes it feel to me like this mostly just gets though CDT issues by effectively becoming a weird unmotivated modification of EDT.

On the other hand, maybe the effect is to be like CDT when you are the only copy of yourself, but then becomes like EDT when other copies are involved, thus allowing it to be like an EDT that smokes in the smoking legion problem.


I am not particularly excited by this proposal, partially because it seems to not be very philosophically motivated, and partially because I think that causality based decision theories do not add anything to EDT. EDT has problems, all of which are in my mind orthogonal to the changes proposed in CDT. However, I think that it does provide an (ugly) fix to what is otherwise a hole in CDT like proposals, and is probably what I would think about if I were to think about that class of proposals.

by Devi Borg 417 days ago | Jessica Taylor and Scott Garrabrant like this | link

[edit: Thanks to Jessica for pointing out that the literal thing I suggested here does not work unless one of the nodes (my decision) is deterministic. After discussing the CTC-like phenomena below we summarized it as finding timeloops to then remove them by being updateless in the right places]

Philosophically, this is kind of weird. It is hard to imagine what would happen if we were trying to discover the causal structure of the universe, and we discover that there is some hidden variable that always turns out to be a certain way, in spite of appearing to be causally downstream from the other variables.

This actually feels very motivated to me: I think about the new epiphenomenal variables as logical facts. It feels intuitive to think about logical facts as being caused by all their concrete instances out in the world rather than as causing the concrete instances. The operation of post-selection then functions as a post-facto condition on logic being consistent (or computations being the same way everywhere). This feels very close to how humans intuit these situations which I think is a good sign.

On the other hand, maybe the effect is to be like CDT when you are the only copy of yourself, but then becomes like EDT when other copies are involved, thus allowing it to be like an EDT that smokes in the smoking legion problem.

Smoking legion is an apt name as the behavior, counterintuitively, depends on the number of copies you consider. Still, I don’t think it’s an illuminating problem for computational decision theories. Problems akin to Agent Simulates Predictor (I looked at Transparent Newcomb’s Problem and Parafit’s Hitchhiker) feel clearer when reasoning about algorithms, and there postCDT fails to be optimal, but approaches the right behavior as you add more copies that are logically entangled.

These problems seem to come from what’s almost a directed cycle in the causal structure: if you consider yourself as causing logical facts but everyone else to be influenced by them you get DAGs in Newcomb’s Problem and Prisoner’s Dilemma, but directed cycles in the above problems that postCDT struggles with. I think this is actually fundamental to how we should think about the problems: not too different from time-travel (intelligence/prediction as time-travel is an interesting metaphor). Being updateless about some inputs looks a lot like finding the most desirable CTC.

Concretely, I propose that we determine the set of parents \(P\) that become pseudo-descendants (according to above point of view) and then, instead of conditioning on the value of \(x\in P\) and choosing an action from the set \(A\), we chose a look-up table from \(A^I\) (where \(I=\prod_{x\in P} x\) is the space of possible inputs, and we always condition normally on the parents not in \(P\)). Now when we sample we treat all the nodes in this pseudo-directed cycle as undetermined other than our decision which is given by the look-up table. This makes us updateless, but only about a bounded set decision-relevant set of things. I’m not sure to what extent this looks as something different than UDT to you, but it strikes me as more modularizable.

In terms of implementation I see two options that both feel unsatisfactory at my current level of understanding:

Firstly, if we’re only doing the standard postCDT (without the loop-based fix) we can sample the causal structure using some kind of Monte Carlo method creating a sizable table of all the outcomes of all the nodes, we then use a GI (with suitable background information) to estimate how likely each of the outcomes is and discard overrepresented outcomes at random until we have the same distribution. In Prisoner’s Dilemma it will correctly know facts like my twin and me making the same decision, and therefore we’ll discard most of the samples where this is not the case.

I haven’t thought about this approach much and I’m not sure if it has problems with spurious counterfactuals (is this clearly avoidable with \(\varepsilon\)-exploration and normalizing each action separately?) and/or weird linkage between the causal structure and the inductor. Because this method can’t be used to find the CTC-like loops it feels less promising overall.

Secondly, we can identify facts about larger computations implementing smaller ones and use this for our logical links, so that the physical world causing logical facts that are instantiated in it becomes a special case of larger computations causing facts about the smaller computations that they implement. Here the very large computation of physics contains the two smaller computations of me and whoever is computing my decision (the latter of which also contains a simulation of the former). Adding these two nodes and three links we get the necessary structure; post-selection follows simply from logic being consistent (i.e. strictly speaking we add one node for every incoming edge to the new nodes and a common child to all of them (if more than one), that checks whether they’re all the same, and post-select on it saying same).

This approach feels closer to my intuitions about how to make decisions, but also feels very incomplete. Finding all sub-computations of a computation seems somewhat tractable, but it’ll be tricky to both locate all the relevant computations (a robot moving around means that it’s memory locations don’t correspond to fixed physical locations, so its computations can’t be had by only forgetting details from our simulation of physics) and not be overwhelmed by irrelevant translations (cf. David Chalmer’s paper Does a Rock Implement Every Finite-State Automaton? for an example of what we don’t want).

It now looks like we’ll have to deal with lots and lots of different sub-computations, but there are two helpful facts to note here: (i) some of these sub-computations factorize (e.g. if I use a sorting algorithm in my computation and someone else simulates me then we can find entanglement through the fact that the sorting algorithm works the same way in both places, but this fact is completely captured by me and the simulation of me working the same way, and thus redundant), (ii) we don’t care about getting all sub-computations (which we do when thinking about ontologies in general because they could contain facts relevant to our utility function/morality) only those that give rise to decision relevant entanglement between different parts of our causal graph.

Finally, there’s the problem of reasoning about programs without simulating them: this isn’t picked up as a sub-computation for any reasonable notion of sub-computation, but the difference between simulating and abstract reasoning should just be an implementation detail and not decision relevant.

It looks like there are also common problems to all approaches along these lines: we have to derive the high-level causal structure of the world (which would require multi-layer world models for all but the simplest problems), and once we’ve done this there’s the artificial boundary between knowledge that’s encoded as the causal structure and knowledge about the distrbution of the nodes. The latter doesn’t seem very troubling to me: it actually seems to echo the phenomenon seen in PSRL/Thompson sampling of alternating stages of inference and optimization. In fact, this duality between epistemic and instrumental rationality is starting to feel as important as the duality between proof and counter-example in my thinking about thinking.






[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes


Privacy & Terms