Intelligent Agent Foundations Forumsign up / log in

This is exactly the sort of thing I’ve wanted for ASP (Agent Simulates Predictor).

One problem that’s always blocked me, is how to know when to do this, rather than using it add-hoc - is there an easy way to know that there’s an agent out in the universe using a more limited reasoning system?


policy selection converges to giving Omega the money so long as the difficulty of computing the coin exceeds the power of the market at \(f(n)\) time.

Would it be sensible to just look for muggings (and ASPs) at the very beginning of the process, and then decide immediately what to do as soon as one is detected?

Come to think of that, precommitting to ignoring knowledge about the result of the coin seems to be the best strategy here; does this cash out into anything useful in this formalism?


by Abram Demski 48 days ago | link

Looking “at the very beginning” won’t work – the beliefs of the initial state of the logical inductor won’t be good enough to sensibly detect these things and decide what to do about them.

While ignoring the coin is OK as special-case reasoning, I don’t think everything falls nicely into the bucket of “information you want to ignore” vs “information you want to update on”. The more general concept which captures both is to ask “how do I want to react to thin information, in terms of my action?” – which is of course the idea of policy selection.


If the other players can see what action you’ll take, then they may simply exploit you.

Isn’t this a variant of the “agent simulates predictor” problem (with you playing the role of the predictor)? Thus any agent capable of exploiting you has to prove to you that it won’t, in order to get anything from you. That’s kind of what happens with your Nicerbots; even if perfectly predictable, they’re not really exploitable in any strong sense (they won’t cooperate with a defector).


by Abram Demski 81 days ago | link

I think the point I was making here was a bit less clear than I wanted it to be. I was saying that, if you use predictable exploration on actions rather than policies, then you only get to see what happens when you predictably take a certain action. This is good for learning pure equilibria in games, but doesn’t give information which would help the agent reach the right mixed equilibria when randomized actions should be preferred; and indeed, it doesn’t seem like such an agent would reach the right mixed equilibria.

I believe the “predictable exploration on policies” approach solves agent-simulates-predictor just fine, along with other problems (including counterfactual mugging) which require “some degree of updatelessness” without requiring the full reflective stability which we want from updatelessness.


by Stuart Armstrong 103 days ago | link | parent | on: Hyperreal Brouwer

To quote the straw vulcan: Fascinating.


I’m not sure this would work, and it might be tied to ambiguity about what “steps” mean.


Y: Run X to completion. Then say “no” to chocolate.

Then PA proves that Y doesn’t lose in less steps than X (since X doesn’t do anything in more than N steps while Y runs N+1 steps before taking action), yet it’s clear that Y loses.

I think it’s because “lose in n steps” is not clear.


by Vladimir Slepnev 204 days ago | Stuart Armstrong likes this | link

It doesn’t mean computation steps. Losing in 1 step means you say “no” to chocolate, losing in 2 steps means you accept some program that says “no” to chocolate, and so on. Sorry, I thought that was the obvious interpretation, I’ll edit the post to make it clear.


by Stuart Armstrong 204 days ago | link

Ah, thanks! That seems more sensible.


Not sure what your argument is. Can you develop it?


by Paul Christiano 210 days ago | link

I expect a workable approach will define the operator implicitly as “that thing which has control over the input channel” rather than by giving an explicit definition. This is analogous to the way in which a sail causes your boat to move with the wind: you don’t have to define or measure the wind precisely, you just have to be easily pushed around by it.


by Stuart Armstrong 204 days ago | link

Thus anything that can control the operator becomes defined as the operator? That doesn’t seem safe…


by Paul Christiano 203 days ago | link

The AI defers to anything that can control the operator.

If the operator has physical control over the AI, than any process which controls the operator can replace the AI wholesale. It feels fine to defer to such processes, and certainly it seems much better than the situation where the operator is attempting to correct the AI’s behavior but the AI is paternalistically unresponsive.

Presumably the operator will try to secure themselves in the same way that they try to secure their AI.


by Stuart Armstrong 202 days ago | link

This also means that if the AI can figure out a way of controlling the controller, then it is itself in control form the moment it comes up with a reasonable plan?


by Paul Christiano 202 days ago | link

The AI replacing the operator is certainly a fixed point.

This doesn’t seem any different from the usual situation. Modifying your goals is always a fixed point. That doesn’t mean that our agents will inevitably do it.

An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants.


by Stuart Armstrong 201 days ago | link

An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants.

I disagree (though we may be interpreting that sentence differently). Once the AI has the possibility of subverting the controller, then it is, in effect, in physical control of itself. So it itself becomes the “formal operator”, and, depending on how it’s motivated, is perfectly willing to replace the “human operator”, whose wishes are now irrelevant (because it’s no longer the formal operator).

And this never involves any goal modification at all - it’s the same goal, except that the change in control has changed the definition of the operator.


It is ‘a preference for preferences’; eg “my long term needs take precedence over my short term desires” is a meta-preference (in fact the use of terms ‘needs’ vs ‘desires’ is itself a meta-preference, as at the lowest formal level, both are just preferences).


How about using some conception of “coalition-stable”? In which an option has that property if there is no sub-coalition of players that can unilaterally increase their utility, whatever all the other players choose to do.


by Paul Christiano 220 days ago | Stuart Armstrong likes this | link

This is basically the core, though it’s usually defined for cooperative games, where (a) utility is transferrable and (b) adding a new player never makes an existing player worse off. It’s easy to generalize though.


Can you use the maximum “swing” from the other player’s decisions to get a non-binary measure of dependency?

It’s similar to my idea here.

Then the dependency of player \(P_i\) on \(P_j\) would be \(\max_{a_j} U_i - \min_{a_j} U_i\), where \(a_j\) are the actions of \(P_j\).


by Stuart Armstrong 232 days ago | Abram Demski likes this | link | parent | on: Futarchy Fix

Now, whether a perfect market should pick up an existential risk signature is different from whether a real market would. The behaviour of the Dow around the Cuban missile crisis isn’t encouraging in that regards:







This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes


Privacy & Terms