Intelligent Agent Foundations Forumsign up / log in

Delegative Reinforcement Learning solves this problem by keeping humans in the loop while preserving consequentialist reasoning. Ofc currently the theory is based on a lot of simplification and the ultimate learning protocol will probably look differently, but I think that the basic mechanism (delegation combined with model-based reasoning) is sound.


This is somewhat related to what I wrote about here. If you consider only what I call convex gamblers/traders and fix some weighting (“prior”) over the gamblers then there is a natural convex set of dominant forecasters (for each history, it is the set of minima of some convex function on \(\Delta\mathcal{O}^\omega\).)


Hi Alex!

The definition of \(h^{!k}\) makes sense for any \(h\), that is, the superscript \(!k\) in this context is a mapping from finite histories to sets of pairs as you said. In the line in question we just apply this mapping to \(x_{:n}\) where \(x\) is a bound variable coming from the expected value.

I hope this helps?


Indeed there is some kind of length limit in the website. I moved Appendices B and C to a separate post.


by Vadim Kosoy 122 days ago | link | parent | on: Hyperreal Brouwer

Very nice. I wonder whether this fixed point theorem also implies the various generalization of Kakutani’s fixed point theorem in the literature, such as Lassonde’s theorem about compositions of Kakutani functions. It sounds like it should because the composition of hypercontinuous functions is hypercontinuous, but I don’t see the formal argument immediately since if we have \(x \in *X,\ y \in *Y\) with standard parts \(x_\omega,\ y_\omega\) s.t. \(f(x)=y\), and and \(y' \in *Y,\ z \in *Z\) with standard parts \(y'_\omega=y_\omega,\ z_\omega\) s.t. \(g(y')=z\) then it’s not clear why there should be \(x'\in X,\ z'\in Z\) s.t. with standard parts \(x'_\omega=x_\omega,\ z'_\omega=z_\omega\) s.t. \(g(f(x'))=z'\).


Freezing the reward seems like the correct answer by definition, since if I am an agent following the utility function \(R\) and I have to design a new agent now, then it is rational for me to design the new agent to follow the utility function I am following now (i.e. this action is usually rated as the best according to my current utility function).


Unfortunately, it’s not just your browser. The website truncates the document for some reason. I emailed Matthew about it and ey are looking into it.


I think technical research should be posted here. Moreover, I think that merging IAFF and LW is a bad idea. We should be striving to attract people from mainstream academia / AI research groups rather than making ourselves seem even more eccentric / esoteric.


Note that the problem with exploration already arises in ordinary reinforcement learning, without going into “exotic” decision theories. Regarding the question of why humans don’t seem to have this problem, I think it is a combination of

  • The universe is regular (which is related to what you said about “we can’t see any plausible causal way it could happen”), so a Bayes-optimal policy with a simplicity prior has something going for it. On the other hand, sometimes you do need to experiment, so this can’t be the only explanation.

  • Any individual human has parents that teach em things, including things like “touching a hot stove is dangerous.” Later in life, ey can draw on much of the knowledge accumulated by human civilization. This tunnels the exploration into safe channels, analogously to the role of the advisor in my recent posts.

  • One may say that the previous point only passes the recursive buck, since we can consider all of humanity to be the “agent”. From this perspective, it seems that the universe just happens to be relatively safe, in the sense that it’s pretty hard for an individual human to do something that will irreparably damage all of humanity… or at least it was the case during most of human history.

  • In addition, we have some useful instincts baked in by evolution (e.g. probably some notion of existing in a three dimensional space with objects that interact mechanically). Again, you could zoom further out and say evolution works because it’s hard to create a species that will wipe out all life.


Typos on page 5:

  • “random explanation” should be “random exploration”
  • “Alpa” should be “Alpha”







If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes


Privacy & Terms