Intelligent Agent Foundations Forumsign up / log in
The Doomsday argument in anthropic decision theory
post by Stuart Armstrong 472 days ago | Abram Demski likes this | discuss

In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).

However, SSA implies the doomsday argument, and, to date, I hadn’t found a good way to express the doomsday argument within ADT.

This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.


Anthropic behaviour

The comparable phrasings of the two doomsday arguments (probability and decision-based) are:

  • In the standard doomsday argument, the probability of extinction is increased for an agent that uses SSA probability versus one that doesn’t.
  • In the ADT doomsday argument, an average utilitarian behaves as if it were a total utilitarian with a higher revealed probability of doom.

Thus in both cases, doomsday agent believes/behaves as if it were a non-doomsday agent with a higher probability of doom.

Revealed probability of events

What are these revealed probabilities?

Well, suppose that \(X\) and \(X'\) are two events that may happen. The agent has a choice between betting on one or the other; if they bet on the first, they get a reward of \(r\) if \(X\) happens, if they bet on the second, they get a reward of \(r'\) if \(X'\) happens.

If an agent is an expected utility maximiser and chooses \(X\) over \(X'\), this implies that \(rP(X) \geq r'P(X')\), where \(P(X)\) and \(P(X')\) are the probabilities the agent assigns to \(X\) and \(X'\).

Thus, observing the behaviour of the agent allows one to deduce their probability estimation for \(X\) and \(X'\).

Revealed anthropic and non-anthropic probabilities

To simplify comparisons, assume that \(Y\) is an event that will happen with probability \(1\); if the agent bets on \(Y\), it will get a reward of \(1\). The \(Y\)’s only purpose is to compare with other events.

Then \(X\) is an event that will happen with an unknown probability, if bet on, the agent will get a reward of \(r\). In comparison, \(X_{s}\) is an event that will happen with certainty if and only if humanity survives for a certain amount of time. If the agent bets on \(X_s\) and it happens, it will then give a reward of \(r_{s}\).

The agent need to bet on one of \(Y\), \(X\), and \(X_s\). Suppose that the agent is an average utilitarian, and that their actual estimated probability for human survival is \(p\); thus \(P(X_s)=p\). If humanity survives, the total human population will be \(\Omega\); if it doesn’t, then it will be limited to \(\omega \leq \Omega\).\(\newcommand{\avstan}{\left(\frac{1-p}{\omega}+\frac{p}{\Omega}\right)}\)\(\newcommand{\avanth}{\left(\frac{p}{\Omega}\right)}\)

Then the following table gives the three possible bets and the expected utility the average utilitarian will derive from them. Since the average utilitarian needs to divide their utility by total population, this expected utility will be a function of the probabilities of the different population numbers.

By varying \(r\) and \(r_s\), we can establish what probabilities the agent actually gives to each event, by comparing with situation when it bets of \(Y\). If we did that, but assumed that the agent was a total utilitarian rather than an average one, we would get the apparent revealed probabilities given in the third column:

Bet Utility App. rev. prob. if tot.
\(Y\) \(\avstan\) \(1\)
\(X\) \(rP(X)\avstan\) \(P(X)\)
\(X_s\) \(r_s\avanth\) \(p'=\avanth/\avstan\)

Note that if \(\Omega=\omega\) — if the population is fixed, so that the average utilitarian behaves the same as a total utilitarian — then \(p'\) simplifies to \(\left(p/\omega\right)/(1/\omega)=p\), the actual probability of survival.

It’s also not hard to see that \(p'\) strictly decreases as \(\Omega\) increases, so it will always be less than \(p\) if \(\Omega > \omega\).

Thus if we interpret the actions of an average utilitarian as if they were a total utilitarian, then for reward conditional on human survival — and only for those rewards, not for others like betting on \(X\) — their actions will seem to imply that they give a lower probability of human survival than they actually do.

Conclusion

The standard doomsday argument argues that we are more likely to be in the first \(50\%\) of the list of all humans that will ever live, rather than in the first \(10\%\), which is still more likely than us being in the first \(1\%\), and so on. The argument is also vulnerable to changes of reference class; it gives different implications if we consider ‘the list of all humans’, ‘the list of all mammals’, or ‘the list of all people with my name’. The doomsday argument has no effect on probabilities not connected with human survival.

All these effects reproduce in this new framework. Being in the first \(n\%\) means that the total human population will be at least \(\omega 100/n\), so the total population \(\Omega\) grows as \(n\) shrinks – and \(p'\), the apparent revealed probability of survival, shrinks as well. Similarly, average utilitarianism gives different answers depending on what reference class is used to define its population. And the apparent revealed probabilities that are not connected with human survival are unchanged from a total utilitarian.

Thus this seems like a very close replication of the doomsday argument in ADT, in terms of behaviour and apparent revealed probabilities. But note that it is not a genuine doomsday argument. It’s all due to the quirky nature of average utilitarianism; the agent doesn’t really believe that the probability of survival goes down, they just behave in a way that would make us infer that they believed that, if we saw them as being a total utilitarian. So there is no actual increased risk.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms