Intelligent Agent Foundations Forumhttp://agentfoundations.org/Intelligent Agent Foundations ForumComment on Meta: IAFF vs LessWronghttp://agentfoundations.org/item?id=1869Alex MennennilComment on Meta: IAFF vs LessWronghttp://agentfoundations.org/item?id=1868Jessica TaylornilComment on Meta: IAFF vs LessWronghttp://agentfoundations.org/item?id=1867Alex MennennilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1866Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1865Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1862Jessica TaylornilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1861Jessica TaylornilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1860Jessica TaylornilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1858Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1857Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1856Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1855Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1854Vadim KosoynilComment on Optimal and Causal Counterfactual Worldshttp://agentfoundations.org/item?id=1853Alex AppelnilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1852Jessica TaylornilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1851Jessica TaylornilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1850Jessica TaylornilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1849Jessica TaylornilComment on Logical Inductors Converge to Correlated Equilibria (Kinda)http://agentfoundations.org/item?id=1848Vadim KosoynilComment on Doubts about Updatelessnesshttp://agentfoundations.org/item?id=1847Vadim KosoynilComment on Resource-Limited Reflective Oracleshttp://agentfoundations.org/item?id=1846Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1845Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1844Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1843Vadim KosoynilComment on The Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1842Vadim KosoynilLogical uncertainty and mathematical uncertaintyhttp://agentfoundations.org/item?id=1822Alex MennenMeta: IAFF vs LessWronghttp://agentfoundations.org/item?id=1817Vadim KosoyThe Learning-Theoretic AI Alignment Research Agendahttp://agentfoundations.org/item?id=1816Vadim Kosoy

In this essay I will try to explain the overall structure and motivation of my AI alignment research agenda. The discussion is informal and no new theorems are proved here. The main features of my research agenda, as I explain them here, are

  • Viewing AI alignment theory as part of a general abstract theory of intelligence

  • Using desiderata and axiomatic definitions as starting points, rather than specific algorithms and constructions

  • Formulating alignment problems in the language of learning theory

  • Evaluating solutions by their formal mathematical properties, ultimately aiming at a quantitative theory of risk assessment

  • Relying on the mathematical intuition derived from learning theory to pave the way to solving philosophical questions

Logical Inductor Tiling and Why it's Hardhttp://agentfoundations.org/item?id=1808Alex Appel

(Tiling result due to Sam, exposition of obstacles due to me)

A Loophole for Self-Applicative Soundnesshttp://agentfoundations.org/item?id=1810Alex AppelLogical Inductors Converge to Correlated Equilibria (Kinda)http://agentfoundations.org/item?id=1804Alex Appel

Logical inductors of “similar strength”, playing against each other in a repeated game, will converge to correlated equilibria of the one-shot game, for the same reason that players that react to the past plays of their opponent converge to correlated equilibria. In fact, this proof is essentially just the proof from Calibrated Learning and Correlated Equilibrium by Forster (1997), adapted to a logical inductor setting.

Logical Inductor Lemmashttp://agentfoundations.org/item?id=1807Alex AppelTwo Notions of Best Responsehttp://agentfoundations.org/item?id=1806Alex Appel

In game theory, there are two different notions of “best response” at play. Causal best-response corresponds to standard game-theoretic reasoning, because it assumes that the joint probability distribution over everyone else’s moves remains unchanged if one player changes their move. The second one, Evidential best-response, can model cases where the actions of the various players are not subjectively independent, such as Death in Damascus, Twin Prisoner’s Dilemma, Troll Bridge, Newcomb, and Smoking Lesion, and will be useful to analyze the behavior of logical inductors in repeated games. This is just a quick rundown of the basic properties of these two notions of best response.

Doubts about Updatelessnesshttp://agentfoundations.org/item?id=1797Alex AppelComputing an exact quantilal policyhttp://agentfoundations.org/item?id=1794Vadim KosoyResource-Limited Reflective Oracleshttp://agentfoundations.org/item?id=1793Alex AppelNo Constant Distribution Can be a Logical Inductorhttp://agentfoundations.org/item?id=1792Alex AppelMusings on Explorationhttp://agentfoundations.org/item?id=1786Alex AppelQuantilal control for finite MDPshttp://agentfoundations.org/item?id=1785Vadim Kosoy

We introduce a variant of the concept of a “quantilizer” for the setting of choosing a policy for a finite Markov decision process (MDP), where the generic unknown cost is replaced by an unknown penalty term in the reward function. This is essentially a generalization of quantilization in repeated games with a cost independence assumption. We show that the “quantilal” policy shares some properties with the ordinary optimal policy, namely that (i) it can always be chosen to be Markov (ii) it can be chosen to be stationary when time discount is geometric (iii) the “quantilum” value of an MDP with geometric time discount is a continuous piecewise rational function of the parameters, and it converges when the discount parameter \(\lambda\) approaches 1. Finally, we demonstrate a polynomial-time algorithm for computing the quantilal policy, showing that quantilization is not qualitatively harder than ordinary optimization.

A Difficulty With Density-Zero Explorationhttp://agentfoundations.org/item?id=1781Alex AppelDistributed Cooperationhttp://agentfoundations.org/item?id=1777Alex Appel

Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Pareto-optimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.

The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2-player 2-move case to the n-player n-move case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3-dimensional object.

Using lying to detect human valueshttp://agentfoundations.org/item?id=1776Stuart ArmstrongIntuitive examples of reward function learning?http://agentfoundations.org/item?id=1775Stuart ArmstrongFunding for independent AI alignment researchhttp://agentfoundations.org/item?id=1774Paul ChristianoImproved regret bound for DRLhttp://agentfoundations.org/item?id=1773Vadim KosoynilBeyond algorithmic equivalence: self-modellinghttp://agentfoundations.org/item?id=1772Stuart ArmstrongBeyond algorithmic equivalence: algorithmic noisehttp://agentfoundations.org/item?id=1771Stuart ArmstrongUsing the universal prior for logical uncertaintyhttp://agentfoundations.org/item?id=1770Vladimir SlepnevnilPassing Troll Bridgehttp://agentfoundations.org/item?id=1769Alex AppelWhy we want unbiased learning processeshttp://agentfoundations.org/item?id=1768Stuart Armstrong

Crossposted at Lesserwrong.

tl;dr: if an agent has a biased learning process, it may choose actions that are worse (with certainty) for every possible reward function it could be learning.