Intelligent Agent Foundations Forumhttp://agentfoundations.org/Intelligent Agent Foundations ForumComment on Smoking Lesion Steelmanhttp://agentfoundations.org/item?id=1796Alex AppelnilComment on No Constant Distribution Can be a Logical Inductorhttp://agentfoundations.org/item?id=1795Sam EisenstatnilComputing an exact quantilal policyhttp://agentfoundations.org/item?id=1794Vadim KosoyResource-Limited Reflective Oracleshttp://agentfoundations.org/item?id=1793Alex Appel

Reflective oracles accurately answer questions about what arbitrary halting probabilistic oracle machines output. It is possible to make a variant of a reflective oracle that accurately answers questions about what sufficiently short-running Turing machines with access to the same oracle output.

No Constant Distribution Can be a Logical Inductorhttp://agentfoundations.org/item?id=1792Alex AppelComment on Musings on Explorationhttp://agentfoundations.org/item?id=1791Alex AppelnilComment on Musings on Explorationhttp://agentfoundations.org/item?id=1789Jessica TaylornilComment on Musings on Explorationhttp://agentfoundations.org/item?id=1788Vadim KosoynilComment on Musings on Explorationhttp://agentfoundations.org/item?id=1787Abram DemskinilMusings on Explorationhttp://agentfoundations.org/item?id=1786Alex AppelQuantilal control for finite MDPshttp://agentfoundations.org/item?id=1785Vadim Kosoy

We introduce a variant of the concept of a “quantilizer” for the setting of choosing a policy for a finite Markov decision process (MDP), where the generic unknown cost is replaced by an unknown penalty term in the reward function. This is essentially a generalization of quantilization in repeated games with a cost independence assumption. We show that the “quantilal” policy shares some properties with the ordinary optimal policy, namely that (i) it can always be chosen to be Markov (ii) it can be chosen to be stationary when time discount is geometric (iii) the “quantilum” value of an MDP with geometric time discount is a continuous piecewise rational function of the parameters, and it converges when the discount parameter \(\lambda\) approaches 1. Finally, we demonstrate a polynomial-time algorithm for computing the quantilal policy, showing that quantilization is not qualitatively harder than ordinary optimization.

Comment on A Difficulty With Density-Zero Explorationhttp://agentfoundations.org/item?id=1784Alex AppelnilA Difficulty With Density-Zero Explorationhttp://agentfoundations.org/item?id=1781Alex AppelComment on Distributed Cooperationhttp://agentfoundations.org/item?id=1779Alex AppelnilComment on Distributed Cooperationhttp://agentfoundations.org/item?id=1778Abram DemskinilDistributed Cooperationhttp://agentfoundations.org/item?id=1777Alex Appel

Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Pareto-optimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.

The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2-player 2-move case to the n-player n-move case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3-dimensional object.

Using lying to detect human valueshttp://agentfoundations.org/item?id=1776Stuart ArmstrongIntuitive examples of reward function learning?http://agentfoundations.org/item?id=1775Stuart ArmstrongFunding for independent AI alignment researchhttp://agentfoundations.org/item?id=1774Paul ChristianoImproved regret bound for DRLhttp://agentfoundations.org/item?id=1773Vadim KosoynilBeyond algorithmic equivalence: self-modellinghttp://agentfoundations.org/item?id=1772Stuart ArmstrongBeyond algorithmic equivalence: algorithmic noisehttp://agentfoundations.org/item?id=1771Stuart ArmstrongUsing the universal prior for logical uncertaintyhttp://agentfoundations.org/item?id=1770Vladimir SlepnevPassing Troll Bridgehttp://agentfoundations.org/item?id=1769Alex AppelWhy we want unbiased learning processeshttp://agentfoundations.org/item?id=1768Stuart Armstrong

Crossposted at Lesserwrong.

tl;dr: if an agent has a biased learning process, it may choose actions that are worse (with certainty) for every possible reward function it could be learning.

Two Types of Updatelessnesshttp://agentfoundations.org/item?id=1765Abram DemskiComment on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policyhttp://agentfoundations.org/item?id=1764258nilComment on Stable Pointers to Value II: Environmental Goalshttp://agentfoundations.org/item?id=1763Vadim KosoynilStable Pointers to Value II: Environmental Goalshttp://agentfoundations.org/item?id=1762Abram DemskiComment on Further Progress on a Bayesian Version of Logical Uncertaintyhttp://agentfoundations.org/item?id=1761Alex AppelnilFurther Progress on a Bayesian Version of Logical Uncertaintyhttp://agentfoundations.org/item?id=1760Alex Appel

I’d like to credit Daniel Demski for helpful discussion.

Strategy Nonconvexity Induced by a Choice of Potential Oracleshttp://agentfoundations.org/item?id=1759Alex AppelComment on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policyhttp://agentfoundations.org/item?id=1758258nilComment on Logical counterfactuals and differential privacyhttp://agentfoundations.org/item?id=1752Nisan StiennonnilComment on An Untrollable Mathematicianhttp://agentfoundations.org/item?id=1751Sam EisenstatnilAn Untrollable Mathematicianhttp://agentfoundations.org/item?id=1750Abram Demski

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

Logical counterfactuals and differential privacyhttp://agentfoundations.org/item?id=1749Nisan Stiennon

Edit: This article has major flaws. See my comment below.

This idea was informed by discussions with Abram Demski, Scott Garrabrant, and the MIRIchi discussion group.

Comment on The set of Logical Inductors is not Convexhttp://agentfoundations.org/item?id=1748Vadim KosoynilComment on The set of Logical Inductors is not Convexhttp://agentfoundations.org/item?id=1747Abram DemskinilComment on Smoking Lesion Steelman IIhttp://agentfoundations.org/item?id=1746Tom EverittnilGoodhart Taxonomyhttp://agentfoundations.org/item?id=1744Scott GarrabrantComment on Delegative Inverse Reinforcement Learninghttp://agentfoundations.org/item?id=1743Vadim KosoynilComment on Delegative Inverse Reinforcement Learninghttp://agentfoundations.org/item?id=1742Alex AppelnilComment on Delegative Inverse Reinforcement Learninghttp://agentfoundations.org/item?id=1741Alex AppelnilComment on Delegative Inverse Reinforcement Learninghttp://agentfoundations.org/item?id=1740Alex AppelnilMore precise regret bound for DRLhttp://agentfoundations.org/item?id=1739Vadim Kosoy

We derive a regret bound for DRL reflecting dependence on:

  • Number of hypotheses

  • Mixing time of MDP hypotheses

  • The probability with which the advisor takes optimal actions

That is, the regret bound we get is fully explicit up to a multiplicative constant (which can also be made explicit). Currently we focus on plain (as opposed to catastrophe) and uniform (finite number of hypotheses, uniform prior) DRL, although this result can and should be extended to the catastrophe and/or non-uniform settings.

Comment on Delegative Inverse Reinforcement Learninghttp://agentfoundations.org/item?id=1738Alex AppelnilValue learning subproblem: learning goals of simple agentshttp://agentfoundations.org/item?id=1737Alex MennenComment on Being legible to other agents by committing to using weaker reasoning systemshttp://agentfoundations.org/item?id=1736Stuart ArmstrongnilComment on Where does ADT Go Wrong?http://agentfoundations.org/item?id=1733Jack Gallaghernil