Intelligent Agent Foundations Forumsign up / log in
by Abram Demski 93 days ago | link | parent

First of all, it seems to me that “updateless CDT” and “updateless EDT” are the same for agents with access to their own internal states immediately prior to the decision theory computation: on an appropriate causal graph such internal states would be the only nodes with arrows leading to the nodes “output of decision theory”, so if their value is known, then severing those arrows does not affect the computation for updating on an observation of the value of the “output of decision theory” node. So the counterfactual and conditional probability distributions are the same, and thus CDT and EDT are the same.

I don’t think “any appropriate causal graph” necessarily has the structure you suggest. (We don’t have a good idea for what causal graphs on logical uncertainty look like.) It’s plausible that your assertion is true, but not obvious.

(If the agent observes itself trying, it infers that it must have done so because it computed the probability as 99%, and thus the probability of success must be 99%.)

EDT isn’t nearly this bad. I think a lot of people have this idea that EDT goes around wagging tails of dogs to try to make the dogs happy. But, EDT doesn’t condition on the dog’s tail wagging: it conditions on personally wagging the dog’s tail, which has no a priori reason to be correlated with the dog’s happiness.

Similarly, EDT doesn’t just condition on “trying”: it conditions on everything it knows, including that it hasn’t yet performed the computation. The only equilibrium solution will be for the AI to run the computation every time except on exploration rounds. It sees that it does quite poorly on the exploration rounds where it tries without running the computation, so it never chooses to do that.





NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes

Hm, I got the same result
by Alex Appel on Predictable Exploration | 1 like

Paul - how widely do you want
by David Krueger on Funding opportunity for AI alignment research | 0 likes

I agree, my intuition is that
by Abram Demski on Smoking Lesion Steelman III: Revenge of the Tickle... | 0 likes

RSS

Privacy & Terms