Intelligent Agent Foundations Forumsign up / log in
The Doomsday argument in anthropic decision theory
post by Stuart Armstrong 174 days ago | Abram Demski likes this | discuss

In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).

However, SSA implies the doomsday argument, and, to date, I hadn’t found a good way to express the doomsday argument within ADT.

This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.

Anthropic behaviour

The comparable phrasings of the two doomsday arguments (probability and decision-based) are:

  • In the standard doomsday argument, the probability of extinction is increased for an agent that uses SSA probability versus one that doesn’t.
  • In the ADT doomsday argument, an average utilitarian behaves as if it were a total utilitarian with a higher revealed probability of doom.

Thus in both cases, doomsday agent believes/behaves as if it were a non-doomsday agent with a higher probability of doom.

Revealed probability of events

What are these revealed probabilities?

Well, suppose that \(X\) and \(X'\) are two events that may happen. The agent has a choice between betting on one or the other; if they bet on the first, they get a reward of \(r\) if \(X\) happens, if they bet on the second, they get a reward of \(r'\) if \(X'\) happens.

If an agent is an expected utility maximiser and chooses \(X\) over \(X'\), this implies that \(rP(X) \geq r'P(X')\), where \(P(X)\) and \(P(X')\) are the probabilities the agent assigns to \(X\) and \(X'\).

Thus, observing the behaviour of the agent allows one to deduce their probability estimation for \(X\) and \(X'\).

Revealed anthropic and non-anthropic probabilities

To simplify comparisons, assume that \(Y\) is an event that will happen with probability \(1\); if the agent bets on \(Y\), it will get a reward of \(1\). The \(Y\)’s only purpose is to compare with other events.

Then \(X\) is an event that will happen with an unknown probability, if bet on, the agent will get a reward of \(r\). In comparison, \(X_{s}\) is an event that will happen with certainty if and only if humanity survives for a certain amount of time. If the agent bets on \(X_s\) and it happens, it will then give a reward of \(r_{s}\).

The agent need to bet on one of \(Y\), \(X\), and \(X_s\). Suppose that the agent is an average utilitarian, and that their actual estimated probability for human survival is \(p\); thus \(P(X_s)=p\). If humanity survives, the total human population will be \(\Omega\); if it doesn’t, then it will be limited to \(\omega \leq \Omega\).\(\newcommand{\avstan}{\left(\frac{1-p}{\omega}+\frac{p}{\Omega}\right)}\)\(\newcommand{\avanth}{\left(\frac{p}{\Omega}\right)}\)

Then the following table gives the three possible bets and the expected utility the average utilitarian will derive from them. Since the average utilitarian needs to divide their utility by total population, this expected utility will be a function of the probabilities of the different population numbers.

By varying \(r\) and \(r_s\), we can establish what probabilities the agent actually gives to each event, by comparing with situation when it bets of \(Y\). If we did that, but assumed that the agent was a total utilitarian rather than an average one, we would get the apparent revealed probabilities given in the third column:

Bet Utility App. rev. prob. if tot.
\(Y\) \(\avstan\) \(1\)
\(X\) \(rP(X)\avstan\) \(P(X)\)
\(X_s\) \(r_s\avanth\) \(p'=\avanth/\avstan\)

Note that if \(\Omega=\omega\) — if the population is fixed, so that the average utilitarian behaves the same as a total utilitarian — then \(p'\) simplifies to \(\left(p/\omega\right)/(1/\omega)=p\), the actual probability of survival.

It’s also not hard to see that \(p'\) strictly decreases as \(\Omega\) increases, so it will always be less than \(p\) if \(\Omega > \omega\).

Thus if we interpret the actions of an average utilitarian as if they were a total utilitarian, then for reward conditional on human survival — and only for those rewards, not for others like betting on \(X\) — their actions will seem to imply that they give a lower probability of human survival than they actually do.


The standard doomsday argument argues that we are more likely to be in the first \(50\%\) of the list of all humans that will ever live, rather than in the first \(10\%\), which is still more likely than us being in the first \(1\%\), and so on. The argument is also vulnerable to changes of reference class; it gives different implications if we consider ‘the list of all humans’, ‘the list of all mammals’, or ‘the list of all people with my name’. The doomsday argument has no effect on probabilities not connected with human survival.

All these effects reproduce in this new framework. Being in the first \(n\%\) means that the total human population will be at least \(\omega 100/n\), so the total population \(\Omega\) grows as \(n\) shrinks – and \(p'\), the apparent revealed probability of survival, shrinks as well. Similarly, average utilitarianism gives different answers depending on what reference class is used to define its population. And the apparent revealed probabilities that are not connected with human survival are unchanged from a total utilitarian.

Thus this seems like a very close replication of the doomsday argument in ADT, in terms of behaviour and apparent revealed probabilities. But note that it is not a genuine doomsday argument. It’s all due to the quirky nature of average utilitarianism; the agent doesn’t really believe that the probability of survival goes down, they just behave in a way that would make us infer that they believed that, if we saw them as being a total utilitarian. So there is no actual increased risk.





[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes


Privacy & Terms