Intelligent Agent Foundations Forumsign up / log in
The Doomsday argument in anthropic decision theory
post by Stuart Armstrong 19 days ago | Abram Demski likes this | discuss

In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).

However, SSA implies the doomsday argument, and, to date, I hadn’t found a good way to express the doomsday argument within ADT.

This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.


Anthropic behaviour

The comparable phrasings of the two doomsday arguments (probability and decision-based) are:

  • In the standard doomsday argument, the probability of extinction is increased for an agent that uses SSA probability versus one that doesn’t.
  • In the ADT doomsday argument, an average utilitarian behaves as if it were a total utilitarian with a higher revealed probability of doom.

Thus in both cases, doomsday agent believes/behaves as if it were a non-doomsday agent with a higher probability of doom.

Revealed probability of events

What are these revealed probabilities?

Well, suppose that \(X\) and \(X'\) are two events that may happen. The agent has a choice between betting on one or the other; if they bet on the first, they get a reward of \(r\) if \(X\) happens, if they bet on the second, they get a reward of \(r'\) if \(X'\) happens.

If an agent is an expected utility maximiser and chooses \(X\) over \(X'\), this implies that \(rP(X) \geq r'P(X')\), where \(P(X)\) and \(P(X')\) are the probabilities the agent assigns to \(X\) and \(X'\).

Thus, observing the behaviour of the agent allows one to deduce their probability estimation for \(X\) and \(X'\).

Revealed anthropic and non-anthropic probabilities

To simplify comparisons, assume that \(Y\) is an event that will happen with probability \(1\); if the agent bets on \(Y\), it will get a reward of \(1\). The \(Y\)’s only purpose is to compare with other events.

Then \(X\) is an event that will happen with an unknown probability, if bet on, the agent will get a reward of \(r\). In comparison, \(X_{s}\) is an event that will happen with certainty if and only if humanity survives for a certain amount of time. If the agent bets on \(X_s\) and it happens, it will then give a reward of \(r_{s}\).

The agent need to bet on one of \(Y\), \(X\), and \(X_s\). Suppose that the agent is an average utilitarian, and that their actual estimated probability for human survival is \(p\); thus \(P(X_s)=p\). If humanity survives, the total human population will be \(\Omega\); if it doesn’t, then it will be limited to \(\omega \leq \Omega\).\(\newcommand{\avstan}{\left(\frac{1-p}{\omega}+\frac{p}{\Omega}\right)}\)\(\newcommand{\avanth}{\left(\frac{p}{\Omega}\right)}\)

Then the following table gives the three possible bets and the expected utility the average utilitarian will derive from them. Since the average utilitarian needs to divide their utility by total population, this expected utility will be a function of the probabilities of the different population numbers.

By varying \(r\) and \(r_s\), we can establish what probabilities the agent actually gives to each event, by comparing with situation when it bets of \(Y\). If we did that, but assumed that the agent was a total utilitarian rather than an average one, we would get the apparent revealed probabilities given in the third column:

Bet Utility App. rev. prob. if tot.
\(Y\) \(\avstan\) \(1\)
\(X\) \(rP(X)\avstan\) \(P(X)\)
\(X_s\) \(r_s\avanth\) \(p'=\avanth/\avstan\)

Note that if \(\Omega=\omega\) — if the population is fixed, so that the average utilitarian behaves the same as a total utilitarian — then \(p'\) simplifies to \(\left(p/\omega\right)/(1/\omega)=p\), the actual probability of survival.

It’s also not hard to see that \(p'\) strictly decreases as \(\Omega\) increases, so it will always be less than \(p\) if \(\Omega > \omega\).

Thus if we interpret the actions of an average utilitarian as if they were a total utilitarian, then for reward conditional on human survival — and only for those rewards, not for others like betting on \(X\) — their actions will seem to imply that they give a lower probability of human survival than they actually do.

Conclusion

The standard doomsday argument argues that we are more likely to be in the first \(50\%\) of the list of all humans that will ever live, rather than in the first \(10\%\), which is still more likely than us being in the first \(1\%\), and so on. The argument is also vulnerable to changes of reference class; it gives different implications if we consider ‘the list of all humans’, ‘the list of all mammals’, or ‘the list of all people with my name’. The doomsday argument has no effect on probabilities not connected with human survival.

All these effects reproduce in this new framework. Being in the first \(n\%\) means that the total human population will be at least \(\omega 100/n\), so the total population \(\Omega\) grows as \(n\) shrinks – and \(p'\), the apparent revealed probability of survival, shrinks as well. Similarly, average utilitarianism gives different answers depending on what reference class is used to define its population. And the apparent revealed probabilities that are not connected with human survival are unchanged from a total utilitarian.

Thus this seems like a very close replication of the doomsday argument in ADT, in terms of behaviour and apparent revealed probabilities. But note that it is not a genuine doomsday argument. It’s all due to the quirky nature of average utilitarianism; the agent doesn’t really believe that the probability of survival goes down, they just behave in a way that would make us infer that they believed that, if we saw them as being a total utilitarian. So there is no actual increased risk.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Note that the problem with
by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

Your confusion is because you
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

RSS

Privacy & Terms