Intelligent Agent Foundations Forumsign up / log in
The Doomsday argument in anthropic decision theory
post by Stuart Armstrong 85 days ago | Abram Demski likes this | discuss

In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).

However, SSA implies the doomsday argument, and, to date, I hadn’t found a good way to express the doomsday argument within ADT.

This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.

Anthropic behaviour

The comparable phrasings of the two doomsday arguments (probability and decision-based) are:

  • In the standard doomsday argument, the probability of extinction is increased for an agent that uses SSA probability versus one that doesn’t.
  • In the ADT doomsday argument, an average utilitarian behaves as if it were a total utilitarian with a higher revealed probability of doom.

Thus in both cases, doomsday agent believes/behaves as if it were a non-doomsday agent with a higher probability of doom.

Revealed probability of events

What are these revealed probabilities?

Well, suppose that \(X\) and \(X'\) are two events that may happen. The agent has a choice between betting on one or the other; if they bet on the first, they get a reward of \(r\) if \(X\) happens, if they bet on the second, they get a reward of \(r'\) if \(X'\) happens.

If an agent is an expected utility maximiser and chooses \(X\) over \(X'\), this implies that \(rP(X) \geq r'P(X')\), where \(P(X)\) and \(P(X')\) are the probabilities the agent assigns to \(X\) and \(X'\).

Thus, observing the behaviour of the agent allows one to deduce their probability estimation for \(X\) and \(X'\).

Revealed anthropic and non-anthropic probabilities

To simplify comparisons, assume that \(Y\) is an event that will happen with probability \(1\); if the agent bets on \(Y\), it will get a reward of \(1\). The \(Y\)’s only purpose is to compare with other events.

Then \(X\) is an event that will happen with an unknown probability, if bet on, the agent will get a reward of \(r\). In comparison, \(X_{s}\) is an event that will happen with certainty if and only if humanity survives for a certain amount of time. If the agent bets on \(X_s\) and it happens, it will then give a reward of \(r_{s}\).

The agent need to bet on one of \(Y\), \(X\), and \(X_s\). Suppose that the agent is an average utilitarian, and that their actual estimated probability for human survival is \(p\); thus \(P(X_s)=p\). If humanity survives, the total human population will be \(\Omega\); if it doesn’t, then it will be limited to \(\omega \leq \Omega\).\(\newcommand{\avstan}{\left(\frac{1-p}{\omega}+\frac{p}{\Omega}\right)}\)\(\newcommand{\avanth}{\left(\frac{p}{\Omega}\right)}\)

Then the following table gives the three possible bets and the expected utility the average utilitarian will derive from them. Since the average utilitarian needs to divide their utility by total population, this expected utility will be a function of the probabilities of the different population numbers.

By varying \(r\) and \(r_s\), we can establish what probabilities the agent actually gives to each event, by comparing with situation when it bets of \(Y\). If we did that, but assumed that the agent was a total utilitarian rather than an average one, we would get the apparent revealed probabilities given in the third column:

Bet Utility App. rev. prob. if tot.
\(Y\) \(\avstan\) \(1\)
\(X\) \(rP(X)\avstan\) \(P(X)\)
\(X_s\) \(r_s\avanth\) \(p'=\avanth/\avstan\)

Note that if \(\Omega=\omega\) — if the population is fixed, so that the average utilitarian behaves the same as a total utilitarian — then \(p'\) simplifies to \(\left(p/\omega\right)/(1/\omega)=p\), the actual probability of survival.

It’s also not hard to see that \(p'\) strictly decreases as \(\Omega\) increases, so it will always be less than \(p\) if \(\Omega > \omega\).

Thus if we interpret the actions of an average utilitarian as if they were a total utilitarian, then for reward conditional on human survival — and only for those rewards, not for others like betting on \(X\) — their actions will seem to imply that they give a lower probability of human survival than they actually do.


The standard doomsday argument argues that we are more likely to be in the first \(50\%\) of the list of all humans that will ever live, rather than in the first \(10\%\), which is still more likely than us being in the first \(1\%\), and so on. The argument is also vulnerable to changes of reference class; it gives different implications if we consider ‘the list of all humans’, ‘the list of all mammals’, or ‘the list of all people with my name’. The doomsday argument has no effect on probabilities not connected with human survival.

All these effects reproduce in this new framework. Being in the first \(n\%\) means that the total human population will be at least \(\omega 100/n\), so the total population \(\Omega\) grows as \(n\) shrinks – and \(p'\), the apparent revealed probability of survival, shrinks as well. Similarly, average utilitarianism gives different answers depending on what reference class is used to define its population. And the apparent revealed probabilities that are not connected with human survival are unchanged from a total utilitarian.

Thus this seems like a very close replication of the doomsday argument in ADT, in terms of behaviour and apparent revealed probabilities. But note that it is not a genuine doomsday argument. It’s all due to the quirky nature of average utilitarianism; the agent doesn’t really believe that the probability of survival goes down, they just behave in a way that would make us infer that they believed that, if we saw them as being a total utilitarian. So there is no actual increased risk.





Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes


Privacy & Terms