by Vadim Kosoy 42 days ago | Abram Demski likes this | link | parent | on: Stable Pointers to Value II: Environmental Goals Delegative Reinforcement Learning solves this problem by keeping humans in the loop while preserving consequentialist reasoning. Ofc currently the theory is based on a lot of simplification and the ultimate learning protocol will probably look differently, but I think that the basic mechanism (delegation combined with model-based reasoning) is sound. reply
 by Vadim Kosoy 70 days ago | link | parent | on: The set of Logical Inductors is not Convex This is somewhat related to what I wrote about here. If you consider only what I call convex gamblers/traders and fix some weighting (“prior”) over the gamblers then there is a natural convex set of dominant forecasters (for each history, it is the set of minima of some convex function on $$\Delta\mathcal{O}^\omega$$.) reply
 by Vadim Kosoy 87 days ago | link | parent | on: Delegative Inverse Reinforcement Learning Hi Alex! The definition of $$h^{!k}$$ makes sense for any $$h$$, that is, the superscript $$!k$$ in this context is a mapping from finite histories to sets of pairs as you said. In the line in question we just apply this mapping to $$x_{:n}$$ where $$x$$ is a bound variable coming from the expected value. I hope this helps? reply
 by Vadim Kosoy 122 days ago | link | parent | on: Catastrophe Mitigation Using DRL Indeed there is some kind of length limit in the website. I moved Appendices B and C to a separate post. reply
 by Vadim Kosoy 122 days ago | link | parent | on: Hyperreal Brouwer Very nice. I wonder whether this fixed point theorem also implies the various generalization of Kakutani’s fixed point theorem in the literature, such as Lassonde’s theorem about compositions of Kakutani functions. It sounds like it should because the composition of hypercontinuous functions is hypercontinuous, but I don’t see the formal argument immediately since if we have $$x \in *X,\ y \in *Y$$ with standard parts $$x_\omega,\ y_\omega$$ s.t. $$f(x)=y$$, and and $$y' \in *Y,\ z \in *Z$$ with standard parts $$y'_\omega=y_\omega,\ z_\omega$$ s.t. $$g(y')=z$$ then it’s not clear why there should be $$x'\in X,\ z'\in Z$$ s.t. with standard parts $$x'_\omega=x_\omega,\ z'_\omega=z_\omega$$ s.t. $$g(f(x'))=z'$$. reply
 by Vadim Kosoy 122 days ago | link | parent | on: Resolving human inconsistency in a simple model Freezing the reward seems like the correct answer by definition, since if I am an agent following the utility function $$R$$ and I have to design a new agent now, then it is rational for me to design the new agent to follow the utility function I am following now (i.e. this action is usually rated as the best according to my current utility function). reply
 by Vadim Kosoy 126 days ago | link | parent | on: Catastrophe Mitigation Using DRL Unfortunately, it’s not just your browser. The website truncates the document for some reason. I emailed Matthew about it and ey are looking into it. reply
 by Vadim Kosoy 170 days ago | David Krueger, Ryan Carey, Sören Mindermann and Stuart Armstrong like this | link | parent | on: Should I post technical ideas here or on LessWrong... I think technical research should be posted here. Moreover, I think that merging IAFF and LW is a bad idea. We should be striving to attract people from mainstream academia / AI research groups rather than making ourselves seem even more eccentric / esoteric. reply
 by Vadim Kosoy 205 days ago | link | parent | on: Open Problems Regarding Counterfactuals: An Introd... Note that the problem with exploration already arises in ordinary reinforcement learning, without going into “exotic” decision theories. Regarding the question of why humans don’t seem to have this problem, I think it is a combination of The universe is regular (which is related to what you said about “we can’t see any plausible causal way it could happen”), so a Bayes-optimal policy with a simplicity prior has something going for it. On the other hand, sometimes you do need to experiment, so this can’t be the only explanation. Any individual human has parents that teach em things, including things like “touching a hot stove is dangerous.” Later in life, ey can draw on much of the knowledge accumulated by human civilization. This tunnels the exploration into safe channels, analogously to the role of the advisor in my recent posts. One may say that the previous point only passes the recursive buck, since we can consider all of humanity to be the “agent”. From this perspective, it seems that the universe just happens to be relatively safe, in the sense that it’s pretty hard for an individual human to do something that will irreparably damage all of humanity… or at least it was the case during most of human history. In addition, we have some useful instincts baked in by evolution (e.g. probably some notion of existing in a three dimensional space with objects that interact mechanically). Again, you could zoom further out and say evolution works because it’s hard to create a species that will wipe out all life. reply
 by Vadim Kosoy 205 days ago | link | parent | on: Open Problems Regarding Counterfactuals: An Introd... Typos on page 5: “random explanation” should be “random exploration” “Alpa” should be “Alpha” reply
 Older

### NEW DISCUSSION POSTS

If you drop the
 by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
 by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
 by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
 by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
 by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
 by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
 by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
 by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
 by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
 by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes