

 Logical Inductors Converge to Correlated Equilibria (Kinda)   post by Alex Appel 29 days ago  Sam Eisenstat and Jessica Taylor like this  discuss  
 Logical inductors of “similar strength”, playing against each other in a repeated game, will converge to correlated equilibria of the oneshot game, for the same reason that players that react to the past plays of their opponent converge to correlated equilibria. In fact, this proof is essentially just the proof from Calibrated Learning and Correlated Equilibrium by Forster (1997), adapted to a logical inductor setting.
 


 Two Notions of Best Response   post by Alex Appel 29 days ago  discuss  
 In game theory, there are two different notions of “best response” at play. Causal bestresponse corresponds to standard gametheoretic reasoning, because it assumes that the joint probability distribution over everyone else’s moves remains unchanged if one player changes their move. The second one, Evidential bestresponse, can model cases where the actions of the various players are not subjectively independent, such as Death in Damascus, Twin Prisoner’s Dilemma, Troll Bridge, Newcomb, and Smoking Lesion, and will be useful to analyze the behavior of logical inductors in repeated games. This is just a quick rundown of the basic properties of these two notions of best response.
 






 Quantilal control for finite MDPs   post by Vadim Kosoy 85 days ago  Ryan Carey, Alex Appel and Abram Demski like this  discuss  
 We introduce a variant of the concept of a “quantilizer” for the setting of choosing a policy for a finite Markov decision process (MDP), where the generic unknown cost is replaced by an unknown penalty term in the reward function. This is essentially a generalization of quantilization in repeated games with a cost independence assumption. We show that the “quantilal” policy shares some properties with the ordinary optimal policy, namely that (i) it can always be chosen to be Markov (ii) it can be chosen to be stationary when time discount is geometric (iii) the “quantilum” value of an MDP with geometric time discount is a continuous piecewise rational function of the parameters, and it converges when the discount parameter \(\lambda\) approaches 1. Finally, we demonstrate a polynomialtime algorithm for computing the quantilal policy, showing that quantilization is not qualitatively harder than ordinary optimization.
 


 Distributed Cooperation   post by Alex Appel 98 days ago  Abram Demski and Scott Garrabrant like this  2 comments  
 Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Paretooptimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.
The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2player 2move case to the nplayer nmove case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3dimensional object.
 







 An Untrollable Mathematician   post by Abram Demski 152 days ago  Alex Appel, Sam Eisenstat, Vadim Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this  1 comment  
 Followup to All Mathematicians are Trollable.
It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the nonconvergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild nonconvergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.
 


 More precise regret bound for DRL   post by Vadim Kosoy 184 days ago  Alex Appel likes this  discuss  
 We derive a regret bound for DRL reflecting dependence on:
That is, the regret bound we get is fully explicit up to a multiplicative constant (which can also be made explicit). Currently we focus on plain (as opposed to catastrophe) and uniform (finite number of hypotheses, uniform prior) DRL, although this result can and should be extended to the catastrophe and/or nonuniform settings.
 



 Being legible to other agents by committing to using weaker reasoning systems   post by Alex Mennen 203 days ago  Stuart Armstrong and Vladimir Slepnev like this  1 comment  
 Suppose that an agent \(A_{1}\) reasons in a sound theory \(T_{1}\), and an agent \(A_{2}\) reasons in a theory \(T_{2}\), such that \(T_{1}\) proves that \(T_{2}\) is sound. Now suppose \(A_{1}\) is trying to reason in a way that is legible to \(A_{2}\), in the sense that \(A_{2}\) can rely on \(A_{1}\) to reach correct conclusions. One way of doing this is for \(A_{1}\) to restrict itself to some weaker theory \(T_{3}\), which \(T_{2}\) proves is sound, for the purposes of any reasoning that it wants to be legible to \(A_{2}\). Of course, in order for this to work, not only would \(A_{1}\) have to restrict itself to using \(T_{3}\), but \(A_{2}\) would to trust that \(A_{1}\) had done so. A plausible way for that to happen is for \(A_{1}\) to reach the decision quickly enough that \(A_{2}\) can simulate \(A_{1}\) making the decision to restrict itself to using \(T_{3}\).  


Older 