Intelligent Agent Foundations Forum Agent Foundations ForumComment on A Loophole for Self-Applicative Soundness AppelnilLogical Inductor Tiling and Why it's Hard Appel

(Tiling result due to Sam, exposition of obstacles due to me)

Comment on A Loophole for Self-Applicative Soundness EisenstatnilComment on A Loophole for Self-Applicative Soundness AppelnilComment on A Loophole for Self-Applicative Soundness EisenstatnilA Loophole for Self-Applicative Soundness AppelLogical Inductors Converge to Correlated Equilibria (Kinda) Appel

Logical inductors of “similar strength”, playing against each other in a repeated game, will converge to correlated equilibria of the one-shot game, for the same reason that players that react to the past plays of their opponent converge to correlated equilibria. In fact, this proof is essentially just the proof from Calibrated Learning and Correlated Equilibrium by Forster (1997), adapted to a logical inductor setting.

Logical Inductor Lemmas AppelTwo Notions of Best Response Appel

In game theory, there are two different notions of “best response” at play. Causal best-response corresponds to standard game-theoretic reasoning, because it assumes that the joint probability distribution over everyone else’s moves remains unchanged if one player changes their move. The second one, Evidential best-response, can model cases where the actions of the various players are not subjectively independent, such as Death in Damascus, Twin Prisoner’s Dilemma, Troll Bridge, Newcomb, and Smoking Lesion, and will be useful to analyze the behavior of logical inductors in repeated games. This is just a quick rundown of the basic properties of these two notions of best response.

Comment on Predicting HCH using expert advice TaylornilComment on Doubts about Updatelessness TaylornilComment on Doubts about Updatelessness KruegernilDoubts about Updatelessness AppelComment on Maximally efficient agents will probably have an anti-daemon immune system ChristianonilComment on Smoking Lesion Steelman AppelnilComment on No Constant Distribution Can be a Logical Inductor EisenstatnilComputing an exact quantilal policy KosoyResource-Limited Reflective Oracles AppelNo Constant Distribution Can be a Logical Inductor AppelComment on Musings on Exploration AppelnilComment on Musings on Exploration TaylornilComment on Musings on Exploration KosoynilComment on Musings on Exploration DemskinilMusings on Exploration AppelQuantilal control for finite MDPs Kosoy

We introduce a variant of the concept of a “quantilizer” for the setting of choosing a policy for a finite Markov decision process (MDP), where the generic unknown cost is replaced by an unknown penalty term in the reward function. This is essentially a generalization of quantilization in repeated games with a cost independence assumption. We show that the “quantilal” policy shares some properties with the ordinary optimal policy, namely that (i) it can always be chosen to be Markov (ii) it can be chosen to be stationary when time discount is geometric (iii) the “quantilum” value of an MDP with geometric time discount is a continuous piecewise rational function of the parameters, and it converges when the discount parameter \(\lambda\) approaches 1. Finally, we demonstrate a polynomial-time algorithm for computing the quantilal policy, showing that quantilization is not qualitatively harder than ordinary optimization.

Comment on A Difficulty With Density-Zero Exploration AppelnilA Difficulty With Density-Zero Exploration AppelComment on Distributed Cooperation AppelnilComment on Distributed Cooperation DemskinilDistributed Cooperation Appel

Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Pareto-optimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.

The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2-player 2-move case to the n-player n-move case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3-dimensional object.

Using lying to detect human values ArmstrongIntuitive examples of reward function learning? ArmstrongFunding for independent AI alignment research ChristianoImproved regret bound for DRL KosoynilBeyond algorithmic equivalence: self-modelling ArmstrongBeyond algorithmic equivalence: algorithmic noise ArmstrongUsing the universal prior for logical uncertainty SlepnevnilPassing Troll Bridge AppelWhy we want unbiased learning processes Armstrong

Crossposted at Lesserwrong.

tl;dr: if an agent has a biased learning process, it may choose actions that are worse (with certainty) for every possible reward function it could be learning.

Two Types of Updatelessness DemskiComment on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy on Stable Pointers to Value II: Environmental Goals KosoynilStable Pointers to Value II: Environmental Goals DemskiComment on Further Progress on a Bayesian Version of Logical Uncertainty AppelnilFurther Progress on a Bayesian Version of Logical Uncertainty Appel

I’d like to credit Daniel Demski for helpful discussion.

Comment on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy on Logical counterfactuals and differential privacy StiennonnilComment on An Untrollable Mathematician EisenstatnilComment on The set of Logical Inductors is not Convex KosoynilComment on The set of Logical Inductors is not Convex Demskinil