Intelligent Agent Foundations Forumsign up / log in
by Paul Christiano 777 days ago | Stuart Armstrong likes this | link | parent

The only way I see to get around this is:

  • Be willing to try X whenever enough people are willing to bet at sufficiently aggressive odds.
  • Assume that honest (greedily log-wealth-maximizing) players have enough money that they can can cause any given X to be tried if a manipulator attempts to suppress it.

It would be interesting to see this style of solution fleshed out, to see exactly how strong the assumptions have to be in order to avoid trouble.

The analog of EXP3 is to have investors put their money on policies (rather than predictions about policy outcomes), to pick each policy with probability proportional to the amount of money behind it, and then to take money away from the people who financed the chosen option based on how badly it performs relative to the best possible outcome (giving that money to the people who financed the non-chosen options). This prevents you from cheating the system in the way you describe, though it also means that investing is quite risky even if you know exactly what is going to happen.

In this analogy, futarchy corresponds to estimating Q values (with a regression loss defined by the market maker you use in the decision markets) and then picking the Q-maximizing action. This can have lower variance but has no guarantees of any kind.

I suspect the optimal thing is to run both kinds of markets in parallel, to use the policy market with the EXP3 rule for picking actions, and to use the decision markets only for variance reduction.

I have thought about this a little bit in the context of online learning, and suspect that we can prove an optimality theorem along these lines. It would be nice to see the analogous claim with markets, and the market version would probably be more relevant to alignment. A clear and convincing exposition would also likely be of interest to researchers in RL.

(As usual, this comment is not intended as a land grab, if anyone executes on this idea and it works out it’s all theirs.)



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms