by Scott Garrabrant 618 days ago | Stuart Armstrong likes this | link | parent | on: Should I post technical ideas here or on LessWrong... I intend to cross-post often. reply
 by Scott Garrabrant 668 days ago | link | parent | on: Cooperative Oracles: Introduction I have stopped working on this sequence, because a coauthor is trying to write it up as a more formal paper instead. reply
 by Scott Garrabrant 741 days ago | link | parent | on: Cooperative Oracles: Stratified Pareto Optima and ... I agree with this. I think that the most interesting direction of future work is to figure out how to have better notions of dependency. I plan on writing some on this in the future, but basically we have not successfully figured out how to deal with this. reply
 by Scott Garrabrant 746 days ago | Abram Demski, Patrick LaVictoire and Vladimir Nesov like this | link | parent | on: Futarchy Fix The property of futarchy that I really don’t like is the fact that one person with a lot of money can bet on “Policy X will lead to bad outcome Y,” causing policy X to never be tried in the first place, and all of that person’s money to be refunded, allowing them to make the same bets next time. This may or may not be a problem in practice, but I would really like to see a good fix for it in theory. This problem is what causes the failure to take the 10 in the 5 and 10 problem described here. One trader in the logical inductor can say that taking the 10 will lead to 0 utility, and then get all his money back, because the markets conditional on taking the 10 never get resolved. (I often refer to this problem as “the futarchy exploit.”) reply
 by Paul Christiano 746 days ago | Stuart Armstrong likes this | link The only way I see to get around this is: Be willing to try X whenever enough people are willing to bet at sufficiently aggressive odds. Assume that honest (greedily log-wealth-maximizing) players have enough money that they can can cause any given X to be tried if a manipulator attempts to suppress it. It would be interesting to see this style of solution fleshed out, to see exactly how strong the assumptions have to be in order to avoid trouble. The analog of EXP3 is to have investors put their money on policies (rather than predictions about policy outcomes), to pick each policy with probability proportional to the amount of money behind it, and then to take money away from the people who financed the chosen option based on how badly it performs relative to the best possible outcome (giving that money to the people who financed the non-chosen options). This prevents you from cheating the system in the way you describe, though it also means that investing is quite risky even if you know exactly what is going to happen. In this analogy, futarchy corresponds to estimating Q values (with a regression loss defined by the market maker you use in the decision markets) and then picking the Q-maximizing action. This can have lower variance but has no guarantees of any kind. I suspect the optimal thing is to run both kinds of markets in parallel, to use the policy market with the EXP3 rule for picking actions, and to use the decision markets only for variance reduction. I have thought about this a little bit in the context of online learning, and suspect that we can prove an optimality theorem along these lines. It would be nice to see the analogous claim with markets, and the market version would probably be more relevant to alignment. A clear and convincing exposition would also likely be of interest to researchers in RL. (As usual, this comment is not intended as a land grab, if anyone executes on this idea and it works out it’s all theirs.) reply
 by Abram Demski 745 days ago | link In my current way of thinking about futarchy, it seems like the right way to do this is through good adjudication. It passes the buck, just like my assumption in a recent post that a logical inductor had a correct logical counterfactual in its underlying deductive system. But for a futarchy, the situation isn’t quite as bad. We could rely on human judgement somehow. But another alternative for an underlying adjudication system occurred to me today. Maybe the market could be adjudicated via models. My intuition is that a claim of existential risk (if made in the underlying adjudication system rather than as a bet) must be accompanied by a plausible model - a relatively short computer program which fits the data so far. A counter-claim would have to give an alternative plausible model which shows no risk. These models would lead to payouts. This could address your problem as well, since a counterfactual claim of doom could be (partially) adjudicated as false by giving a casual model. (I don’t intend this proposal to help for logical counterfactuals; it just allows regular causal counterfactuals, described in some given formalism.) But I haven’t thought through how this would work yet. reply
 by Scott Garrabrant 761 days ago | Stuart Armstrong likes this | link | parent | on: Cooperative Oracles: Nonexploited Bargaining The problem is that our original set was a product of the actions available to the players, so they were able to cut things off using their own actions. When you restrict to the Pareto frontier, this is no longer the case. reply
 by Scott Garrabrant 761 days ago | link | parent | on: Cooperative Oracles: Nonexploited Bargaining Yeah. The original generator of these ideas was that we were trying to find (or prove impossible) an improvement on NicerBot: An agent with reflective oracles that cooperates with itself (regardless of what reflective oracle is chosen), but is never exploited in expectation (even by epsilon.) reply
 by Scott Garrabrant 761 days ago | Stuart Armstrong likes this | link | parent | on: Cooperative Oracles: Nonexploited Bargaining I don’t see how it would. The closest thing I can think of is letting agents choose randomly between different fair sets, but I don’t see what that would buy for you. reply
 by Scott Garrabrant 796 days ago | link | parent | on: Formal Open Problem in Decision Theory I give a stronger version of this problem here. reply
 by Scott Garrabrant 838 days ago | Ryan Carey, Abram Demski and Jessica Taylor like this | link | parent | on: Generalizing Foundations of Decision Theory I am not optimistic about this project. My primary reason is that decision theory has two parts. First, there is the part that is related to this post, which I’ll call “Expected Utility Theory.” Then, there is the much harder part, which I’ll call “Naturalized Decision Theory.” I think expected utility theory is pretty well understood, and this post plays around with details of a well understood theory, while naturalized decision theory is not well understood at all. I think we agree that the work in this post is not directly related to naturalized decision theory, but you think it is going to help anyway. My understanding of your argument (correct me if I am wrong) is that probability theory is to logical uncertainty as expected utility theory is to naturalized decision theory, and dutch books lead to LU progress, so VNMish should lead to NDT progress. I challenge this in two ways. First, Logical Inductors look like dutch books, but this might be because things related to probability theory can be talked about with dutch books. I don’t think that thinking about Dutch books lead to the invention of Logical Inductors (Although maybe they would have if I followed the right path), and I don’t think that the post hoc connection provides much evidence that thinking about dutch books is useful. Perhaps whenever you have a theory, you can do this formal justification stuff, but formal justification does not create theories. I realize that I actually do not stand behind this first challenge very much, but I still want to put it out there as a possibility. Second, I think that in a way Logical Uncertainty is about resource bounded Probability theory, and this is why a weakening of dutch books helped. On the other hand, Naturalized Decision Theory is not about resource bounded Expected Utility Theory. We made a type of resource bounded Probability theory, and magically got some naturalistic reasoning out of it. I expect that we cannot do the same thing for decision theory, because the relationship is more complicated. Expected Utility Theory is about your preferences over various worlds. If you follow the analogy with LI strongly, if you succeed, we will be able to extend it to having preferences over various worlds which contain yourself. This seems very far from a solution to naturalized decision theory. In fact, it does not feel that far from what we might be able to easily do with existing Expected Utility Theory plus logical inductors. Perhaps I am attacking a straw man, and you mean “do the same thing we did with logical induction” less literally than I am interpreting it, but in this case there is way more special sauce in the part about what you do to generalize expected utility theory, so I expect it to be much harder than the Logical Induction case. reply
 by Jessica Taylor 835 days ago | Abram Demski and Scott Garrabrant like this | link On the other hand, Naturalized Decision Theory is not about resource bounded Expected Utility Theory. I think there’s a sense in which I buy this but it might be worth explaining more. My current suspicion is that “agents that have utility functions over the outcome of the physics they are embedded in” is not the right concept for understanding naturalized agency (in particular, the “motive forces” of the things that emerge from processes like abiogenesis/evolution/culture/AI research and development). This concept is often argued for using dutch-book arguments (e.g. VNM). I think these arguments are probably invalid when applied to naturalized agents (if taken literally they assume something like a “view from nowhere” that is unachievable from within the physics, unbounded computation, etc). As such, re-examining what arguments can be made about coherent naturalized agency while avoiding inscription errors* seems like a good path towards recovering the correct concepts for thinking about naturalized agency. *I’m getting the term “inscription error” from Brian Cantwell Smith (On the Origin of Objects, p. 50): It is a phenomenon that I will in general call an inscription error: a tendency for a theorist or observer, first, to write or project or impose or inscribe a set of ontological assumptions onto a computational system (onto the system itself, onto the task domain, onto the relation between the two, and so forth), and then, second, to read those assumptions or their consequences back off the system, as if that constituted an independent empirical discovery or theoretical result. reply
 by Abram Demski 833 days ago | Scott Garrabrant likes this | link I think expected utility theory is pretty well understood, and this post plays around with details of a well understood theory, while naturalized decision theory is not well understood at all. I think most of our disagreement actually hinges on this part. My feeling is that I, at least, don’t understand EU well enough; when I look at the foundations which are supposed to argue decisively in its favor, they’re not quite as solid as I’d like. If I was happy with the VNM assumption of probability theory (which I feel is circular, since Dutch Book assumes EU), I think my position would be similar to this (linked by Alex), which strongly agrees with all of the axioms but continuity, and takes continuity as provisionally reasonable. Continuity would be something to maybe dig deeper into at some point, but not so likely to bear fruit that I’d want to investigate right away. However, what’s really interesting is justification of EU and probability theory in one stroke. The justification of the whole thing from only money-pump/dutch-book style arguments seems close enough to be tantalizing, while also having enough hard-to-justify parts to make it a real possibility that such a justification would be of an importantly generalized DT. First, […] I don’t think that thinking about Dutch books lead to the invention of Logical Inductors (Although maybe they would have if I followed the right path), and I don’t think that the post hoc connection provides much evidence that thinking about dutch books is useful. All I have to say here is that I find it somewhat plausible outside-view; an insight from a result need not be an original generator of the result. I think max-margin classifiers in machine learning are like this; the learning theory which came from explaining why they work was then fruitful in producing other algorithms. (I could be wrong here.) Second, I think that in a way Logical Uncertainty is about resource bounded Probability theory, and this is why a weakening of dutch books helped. On the other hand, Naturalized Decision Theory is not about resource bounded Expected Utility Theory. I don’t think naturalized DT is exactly what I’m hoping to get. My highest hope that I have any concrete reason to expect is a logically-uncertain DT which is temporally consistent (without a parameter for how long to run the LI). reply
 by Scott Garrabrant 854 days ago | link | parent | on: Entangled Equilibria and the Twin Prisoners' Dilem... Fixed the $$\varepsilon$$, thanks. reply
 Older

NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes