This post is an informal summary of ideas that Scott and I discussed on my recent visit to Los Angeles, meant to serve as a temporary “savepoint” and reference. The approach outlined here, albeit very promising in my opinion, requires a more work to become mature.
Abstract
We introduce a new type of game theoretic equilibrium, which is a candidate game theoretic desideratum for decision theory. This can be regarded as a generalization of Hofstadter’s superrationality to arbitrary games. This equilibrium can be interpreted as the outcome of games between transparent agents running CDT with restricted selfmodification but also as the outcome of games between transparent agents employing a certain type of logical counterfactuals. We suggest it should be possible to use the formalism of optimal predictors to rigorously construct a logical decision theory which implements the latter interpretation.
Introduction
It is commonly assumed that two agents running the correct decision theory would cooperate in a transparent Prisoner’s dilemma, that is, in a game of prisoner’s dilemma in which each player knows the source code of the other player\(^1\). It is possible to argue this behavior will indeed arise in UDT, at least for sufficiently similar players, but the argument is unrigorous since an agreed upon formalisation of UDT is lacking. It is also possible to formally prove this behavior for certain setups of proofbased decision theory. However, it is not presently clear what is the correct desideratum for decision theory in arbitrary games, especially games that are asymmetric.
The correct decision theory is likely to use logical counterfactuals but current understanding of logical counterfactuals is poor. On the other hand, it is much easier to model causal counterfactuals. Although CDT often leads to bad outcomes (and in particular defects in prisoner’s dilemma) it is reflectively unstable and given the chance is expected to selfmodify into some kind of better decision theory\(^2\). We will now see that a naive formalisation of this idea fails to achieve satisfactory gametheoretic properties. This failure is later amended by restricting the “selfmodification” to only produce programs of a certain form, arriving at what we hope is the correct desideratum.
Selfmodifying CDT
Fix a universal Turing machine to be used for representing Turing machines as words (programs).
Consider a game \(G\) in normal form between \(n\) players. Denote \(\Sigma_i\) the set of pure strategies for the \(i\)th player and \(u_i : \prod_{j=1}^n \Sigma_j \rightarrow {\mathbb{R}}\) the utility function of the \(i\)th player.
Given \(t \in {\mathbb{N}}\), we define the associated selfmodification game \(G^{SM(t)}\) as follows. The pure strategies of the \(i\)th player are \(\Sigma_i^{SM(t)}:={{\{ 0, 1 \}^{t}}}\). These strategy represent programs that receive the strategies of the other players as input, run for \(t\) time steps and produce a strategy in the original game, possibly using random coin flips. The utility function of the \(i\)th player is given by
\[u_i^{SM(t)}(\pi_1, \pi_2 \ldots \pi_n) := E_{U^{nt}}[u_i(ev^t(\pi_1; \pi_2, \pi_3 \ldots \pi_n, x_1), ev^t(\pi_2; \pi_1, \pi_3 \ldots \pi_n, x_2) \ldots ev^t(\pi_n; \pi_1, \pi_2 \ldots \pi_{n1}, x_n))]\]
For example, denoting the prisoner’s dilemma \(G_{PD}\) and taking \(t >> 0\), there are Nash equilibria in \(G_{PD}^{SM(t)}\) which corresponds to the outcome \((C,C)\) in \(G_{PD}\). For example take \(q\) to be “cliquebot” program, constructed using quining:
\[q(\pi):=\begin{cases}C \text{ if } q \text{ is a prefix of } \pi \\ D \text { otherwise}\end{cases}\]
Then obviously \((q_1,q_2)\) is a Nash equilibrium of \(G_{PD}^{SM(t)}\) for any \(q_1,q_2 \in {{\{ 0, 1 \}^{t}}}\) s.t. \(q\) is a prefix of \(q_1\) and \(q_2\). However, the “defectbot” \(d(\pi):=D\) also gives a Nash equilibrium. More generally, for any \(p \in [0,1]\) a dyadic rational we can define the \(p\)cliquebot to be
\[q_p(\pi):=\begin{cases}C \text{ with probability }p \text{ if } q_p \text{ is a prefix of } \pi \\ D \text { otherwise}\end{cases}\]
The defectbot equilibrium turns out to be unstable in the sense that it’s not a thermodynamic equilibrium at zero temperature (see below). However \(p\)cliquebots for \(p > 0\) yield equilibria which are stable in this sense. This is a deficiency of the formalism which is avoided by the metathreat hierarchy (see below).
Thermodynamic Game Equilibria
Consider a game \(G\) in normal form as before. Denote \(\Delta_i\) the set of mixed strategies of the \(i\)th player i.e.
\[\Delta_i := \{\mu: \Sigma_i \rightarrow [0,1] \mid \sum_{a \in \Sigma_i} \mu(a) = 1\}\]
Consider \(T \in ({\mathbb{R}}^{>0})^n\). A thermodynamic equilibrium of \(G\) at temperature \(T\) is \(\mu:=(\mu_1, \mu_2 \ldots \mu_n) \in \prod_{i=1}^n \Delta_i\) s.t.
\[\mu_i(a)=Z_i^{1} \operatorname{exp}(\frac{E_{\mu}[u_i(a_1, a_2 \ldots a \ldots a_n)]}{T_i})\]
where \(Z_i \in {\mathbb{R}}\) are normalization constants. Alternatively, \(\mu\) is a thermodynamic equilibrium iff
\[\mu_i = \arg\max_{\mu \in \Delta_i} (E_{\mu}[u_i(a_1, a_2 \ldots a_n)] + T_i S[\mu_i])\]
where \(S[\mu_i]\) is the informationtheoretic entropy of \(\mu_i\).
At least one thermodynamic equilibrium exists for any game and temperature, as a consequence of Brouwer’s fixedpoint theorem.
\(\mu \in \prod_{i=1}^n \Delta_i\) is called a thermodynamic equilibrium of \(G\) at zero temperature when there is a sequence \(\{(\mu^k \in \prod_{i=1}^n \Delta_i, \, T^k \in ({\mathbb{R}}^{>0})^n)\}_{k \in {\mathbb{N}}}\) s.t. \({\lim_{k \rightarrow \infty}} \mu^k = \mu\), \({\lim_{k \rightarrow \infty}} T^k = 0\) and \(\mu^k\) is a thermodynamic equilibrium at temperature \(T^k\).
At least one thermodynamic equilibrium at zero temperature exists for any game, as a consequence of the BolzanoWeierstrass theorem. Any thermodynamic equilibrium at zero temperature is a trembling hand equilibrium but the converse is false in general. We believe that for finite extensiveform games, a thermodynamic equilibrium at zero temperature is a subgame perfect equilibrium.
We believe \((d,d)\) is a trembling hand equilibrium in \(G_{PD}^{SM(t)}\) but not a thermodynamic equilibrium at zero temperature. On the other hand \((q_p, q_p)\) is a thermodynamic equilibrium at zero temperature for \(p > 0\) (when random bits are appended to \(q_p\) until it becomes a string of length \(t\)).
Thermodynamic equilibria are interesting because it is possible to implement them using a natural decision theory (that is, a decision theory that treats everything as a 1player game by modeling the other players). Games between transparent CDT agents using reflective oracles lead to Nash equilibria, and it should be straightforward to “thermalize” the decision theory and get thermodynamic equilibria at finite temperature. A similar result should be possible using optimal predictor schemes instead of reflective oracles.
The \(p\)cliquebot can be informally regarded as an infinite tower of threats of increasing level of “meta”. The 1st level says any strategy other than cooperation with probability \(p\) will be answered by defection. The \(k+1\)st level says that failure to produce the \(k\)th level threat will be answered by defection. This way the players blackmail each other into complying with this suboptimal strategy. However, if we had a way to truncate the tower at any finite level, it would apart: there would be no justification for each player to produce the highest level threat and it would unwind downwards recursively. The metathreat hierarchy is a formal method to implement this truncation.
Consider a game \(G\) in normal form as before. Fix \(m \in {\mathbb{N}}\). The associated level \(0\) metathreat game with \(m\) coinflips is just \(G\) itself: \(G^{MT(0,m)}:=G\). We define the associated level \(k+1\) metathreat game with \(m\) coinflips \(G^{MT(k+1,m)}\) recursively. The pure strategies of the \(i\)th player are
\[\Sigma_i^{MT(k+1,m)}:=\{\pi: \prod_{j \ne i} \Sigma_j^{MT(k,m)} \times {{\{ 0, 1 \}^{m}}} \rightarrow \Sigma_i^{MT(k,m)} \mid \exists a \in \Sigma_i^{MT(k,m)} \forall b \in \prod_{j \ne i} \Sigma_j^{MT(k,m)} \exists x \in {{\{ 0, 1 \}^{m}}}: \pi(b,x)=a \}\]
Consider \(\pi \in \prod_{i=1}^n \Sigma_i^{MT(k+1,m)}\). Then there is \(\mu^\pi \in \prod_{i=1}^n \Delta_i^{MT(k,m)}\) unique s.t.
\[\mu_i^\pi(a) = Pr_{\mu^\pi \times U^m}[\pi_i(b,x)=a]\]
It exists because of Brouwer’s fixedpoint theorem and is unique because of standard results about Markov chains (\(\pi\) defines a Markov chain on \(\prod_{i=1}^n \Sigma_i^{MT(k,m)}\), \(\mu^\pi\) is a stationary distribution of this chain and it has to be unique because of the properties of the chain that follow from the condition on \(\pi_i\)). This allows us to define the utility functions
\[u_i^{MT(k+1,m)}(\pi) = E_{\mu^\pi}[u_i^{MT(k,m)}(a)]\]
Denote \(\Gamma^{MT(k,m)}\) the space of correlated mixed outcomes of \(G^{MT(k,m)}\):
\[\Gamma^{MT(k,m)}:=\{\nu: \prod_{i=1}^n \Sigma_i^{MT(k,m)} \rightarrow [0,1] \mid \sum_{\pi \in \prod_{i=1}^n \Sigma_i^{MT(k,m)}} \nu(\pi) = 1\}\]
There is a natural sense in which \(\prod_{i=1}^n \Delta_i^{MT(k,m)} \subseteq \Gamma^{MT(k,m)}\).
For \(k > j\) we define the unwinding operator \(\omega^{MT(k \rightarrow j,m)}: \Gamma^{MT(k,m)} \rightarrow \Gamma^{MT(j,m)}\) recursively by
\[\omega^{MT(k+1 \rightarrow k,m)}(\nu):=\sum_{\pi \in \prod_{i=1}^n \Sigma_i^{MT(k+1,m)}} \nu(\pi) \mu^\pi\]
\[\omega^{MT(k \rightarrow j,m)}(\nu):=\omega^{MT(j+1 \rightarrow j,m)}(\omega^{MT(k \rightarrow j+1,m)}(\nu))\]
Consider \(\nu \in \Gamma\), a correlated mixed outcome of \(G\). \(\nu\) is called a level \(k\) metathreat equilibrium when there is a sequence \(\{(m^i \in {\mathbb{N}}, \mu^i \in \prod_{i=1}^n \Delta_i^{MT(k,m^i)})\}_{i \in {\mathbb{N}}}\) s.t. \({\lim_{i \rightarrow \infty}} m_i = \infty\), \({\lim_{i \rightarrow \infty}} \omega^{MT(k \rightarrow 0,m_i)}(\mu^i) = \nu\) and \(\mu^i\) is a thermodynamic equilibrium of \(G^{MT(k,m^i)}\) at zero temperature. The BolzanoWeierstrass theorem ensures such \(\nu\) exists for any game and value of \(k\).
We believe that for \(m >> 0\), \(G_{PD}^{MT(1,m)}\) has a unique thermodynamic equilibrium at zero temperature which is a mixture of \(\pi\) s.t.
\[Pr_{U^m}[\pi(C,x)=C] = 1\]
\[Pr_{U^m}[\pi(D,x)=C] = 2^{m}\]
Therefore \((C,C)\) is the unique level \(1\) metathreat equilibrium of \(G_{PD}\).
Some questions for further research:
 Does the set of level \(k\) metathreat equilibria stabilize for \(k >> 0\)?
 Are level \(k\) metathreat equilibria guaranteed to be Pareto efficient for \(k >> 0\)? Or even already for \(k > 0\)?
 How do level \(k\) metathreat equilibria look like for \(G_{PD}\) in \(\Sigma^{MT(k,m)}\)space for \(k > 1\)?
 Does anything change for \(k >> 0\) if we explicitly introduce shared random bits? It seems plausible that shared random bits can always be replaced by going to higher values of \(k\).
One decision theoretic interpretation of metathreat equilibria is in terms of selfmodifying CDT. The elements of \(\Sigma^{MT(k,m)}\) can be interpreted as programs that use reflective oracles or optimal predictor schemes to make predictions about the opposing programs and decide on a strategy accordingly. Thus our selfmodification is restricted by the programs in this class.
Another interpretation is in terms of logical decision theory, at least for \(k = 1\). Such a decision theory consists of two stages. In the first stage the agent computes logical conditional expected utility (using an optimal predictor scheme) where the condition corresponds to selection of a given element in \(\pi \in \Sigma^{MT(k,1)}\). It then performs a sequence of logical coin flips to decide on \(\pi\) according to a probability distribution designed to yield a thermodynamic equilibrium. In the second stage the agent uses an optimal predictor scheme to guess the strategy of its opponent and acts according to \(\pi\), this time providing so much resources to the optical predictor scheme that the result of the first stage for the opposing agent is certain. Such a decision theory is not entirely natural since it explicitly uses the concept of “opponent actions”, but hopefully it can be used as a foundation for constructing a fully natural logical decision theory.
\(^1\) The historical origin of this idea is Hofstadter “superrationality”.
\(^2\) We don’t expect the limit of this selfmodification process to be entirely reasonable since in some situations selfmodification cannot save CDT, for example if selfmodification only begins after the agent is inspected by Omega in Newcomb’s paradox. However, it seems plausible to specify assumptions under which selfmodifying CDT will make the right choices e.g. Newcomb’s paradox where selfmodification is allowed before inspection by Omega.
