Intelligent Agent Foundations Forumsign up / log in
1.Policy Selection Solves Most Problems
post by Abram Demski 50 days ago | Alex Appel and Vladimir Slepnev like this | 4 comments

It seems like logically updateless reasoning is what we would want in order to solve many decision-theory problems. I show that several of the problems which seem to require updateless reasoning can instead be solved by selecting a policy with a logical inductor that’s run a small amount of time. The policy specifies how to make use of knowledge from a logical inductor which is run longer. This addresses the difficulties which seem to block logically updateless decision theory in a fairly direct manner. On the other hand, it doesn’t seem to hold much promise for the kind of insights which we would want from a real solution.

continue reading »
2.Reflective oracles as a solution to the converse Lawvere problem
post by Sam Eisenstat 62 days ago | Alex Mennen, Alex Appel, Vadim Kosoy, Abram Demski, Jessica Taylor, Scott Garrabrant and Vladimir Slepnev like this | discuss

1 Introduction

Before the work of Turing, one could justifiably be skeptical of the idea of a universal computable function. After all, there is no computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\) such that for all computable \(g\colon\mathbb{N}\to\mathbb{N}\) there is some index \(i_{g}\) such that \(f\left(i_{g},n\right)=g\left(n\right)\) for all \(n\). If there were, we could pick \(g\left(n\right)=f\left(n,n\right)+1\), and then \[g\left(i_{g}\right)=f\left(i_{g},i_{g}\right)+1=g\left(i_{g}\right)+1,\] a contradiction. Of course, universal Turing machines don’t run into this obstacle; as Gödel put it, “By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion.” [1]

The miracle of Turing machines is that there is a partial computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) such that for all partial computable \(g\colon\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) there is an index \(i\) such that \(f\left(i,n\right)=g\left(n\right)\) for all \(n\). Here, we look at a different “miracle”, that of reflective oracles [2,3]. As we will see in Theorem 1, given a reflective oracle \(O\), there is a (stochastic) \(O\)-computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\left\{ 0,1\right\}\) such that for any (stochastic) \(O\)-computable function \(g\colon\mathbb{N}\to\left\{ 0,1\right\}\), there is some index \(i\) such that \(f\left(i,n\right)\) and \(g\left(n\right)\) have the same distribution for all \(n\). This existence theorem seems to skirt even closer to the contradiction mentioned above.

We use this idea to answer “in spirit” the converse Lawvere problem posed in [4]. These methods also generalize to prove a similar analogue of the ubiquitous converse Lawvere problem from [5]. The original questions, stated in terms of topology, remain open, but I find that the model proposed here, using computability, is equally satisfying from the point of view of studying reflective agents. Those references can be consulted for more motivation on these problems from the perspective of reflective agency.

Section 3 proves the main lemma, and proves the converse Lawvere theorem for reflective oracles. In section 4, we use that to give a (circular) proof of Brouwer’s fixed point theorem, as mentioned in [4]. In section 5, we prove the ubiquitous converse Lawvere theorem for reflective oracles.

continue reading »
3.Comparing LICDT and LIEDT
post by Abram Demski 88 days ago | Alex Appel likes this | discuss

Attempted versions of CDT and EDT can be constructed using logical inductors, called LICDT and LIEDT. It is shown, however, that LICDT fails XOR Blackmail, and LIEDT fails Newcomb. One interpretation of this is that LICDT and LIEDT do not implement CDT and EDT very well. I argue that they are indeed forms of CDT and EDT, but stray from expectations because they also implement the ratifiability condition I discussed previously. Continuing the line of thinking from that post, I discuss conditions in which LICDT=LIEDT, and try to draw out broader implications for decision theory.

continue reading »
4.Delegative Inverse Reinforcement Learning
post by Vadim Kosoy 200 days ago | Alex Appel likes this | 11 comments

We introduce a reinforcement-like learning setting we call Delegative Inverse Reinforcement Learning (DIRL). In DIRL, the agent can, at any point of time, delegate the choice of action to an “advisor”. The agent knows neither the environment nor the reward function, whereas the advisor knows both. Thus, DIRL can be regarded as a special case of CIRL. A similar setting was studied in Clouse 1997, but as far as we can tell, the relevant literature offers few theoretical results and virtually all researchers focus on the MDP case (please correct me if I’m wrong). On the other hand, we consider general environments (not necessarily MDP or even POMDP) and prove a natural performance guarantee.

The use of an advisor allows us to kill two birds with one stone: learning the reward function and safe exploration (i.e. avoiding both the Scylla of “Bayesian paranoia” and the Charybdis of falling into traps). We prove that, given certain assumption about the advisor, a Bayesian DIRL agent (whose prior is supported on some countable set of hypotheses) is guaranteed to attain most of the value in the slow falling time discount (long-term planning) limit (assuming one of the hypotheses in the prior is true). The assumption about the advisor is quite strong, but the advisor is not required to be fully optimal: a “soft maximizer” satisfies the conditions. Moreover, we allow for the existence of “corrupt states” in which the advisor stops being a relevant signal, thus demonstrating that this approach can deal with wireheading and avoid manipulating the advisor, at least in principle (the assumption about the advisor is still unrealistically strong). Finally we consider advisors that don’t know the environment but have some beliefs about the environment, and show that in this case the agent converges to Bayes-optimality w.r.t. the advisor’s beliefs, which is arguably the best we can expect.

continue reading »





This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes


Privacy & Terms