Intelligent Agent Foundations Forumsign up / log in

This is exactly the sort of thing I’ve wanted for ASP (Agent Simulates Predictor).

One problem that’s always blocked me, is how to know when to do this, rather than using it add-hoc - is there an easy way to know that there’s an agent out in the universe using a more limited reasoning system?


policy selection converges to giving Omega the money so long as the difficulty of computing the coin exceeds the power of the market at \(f(n)\) time.

Would it be sensible to just look for muggings (and ASPs) at the very beginning of the process, and then decide immediately what to do as soon as one is detected?

Come to think of that, precommitting to ignoring knowledge about the result of the coin seems to be the best strategy here; does this cash out into anything useful in this formalism?


by Abram Demski 113 days ago | link

Looking “at the very beginning” won’t work – the beliefs of the initial state of the logical inductor won’t be good enough to sensibly detect these things and decide what to do about them.

While ignoring the coin is OK as special-case reasoning, I don’t think everything falls nicely into the bucket of “information you want to ignore” vs “information you want to update on”. The more general concept which captures both is to ask “how do I want to react to thin information, in terms of my action?” – which is of course the idea of policy selection.


If the other players can see what action you’ll take, then they may simply exploit you.

Isn’t this a variant of the “agent simulates predictor” problem (with you playing the role of the predictor)? Thus any agent capable of exploiting you has to prove to you that it won’t, in order to get anything from you. That’s kind of what happens with your Nicerbots; even if perfectly predictable, they’re not really exploitable in any strong sense (they won’t cooperate with a defector).


by Abram Demski 146 days ago | link

I think the point I was making here was a bit less clear than I wanted it to be. I was saying that, if you use predictable exploration on actions rather than policies, then you only get to see what happens when you predictably take a certain action. This is good for learning pure equilibria in games, but doesn’t give information which would help the agent reach the right mixed equilibria when randomized actions should be preferred; and indeed, it doesn’t seem like such an agent would reach the right mixed equilibria.

I believe the “predictable exploration on policies” approach solves agent-simulates-predictor just fine, along with other problems (including counterfactual mugging) which require “some degree of updatelessness” without requiring the full reflective stability which we want from updatelessness.


by Stuart Armstrong 169 days ago | link | parent | on: Hyperreal Brouwer

To quote the straw vulcan: Fascinating.


I’m not sure this would work, and it might be tied to ambiguity about what “steps” mean.


Y: Run X to completion. Then say “no” to chocolate.

Then PA proves that Y doesn’t lose in less steps than X (since X doesn’t do anything in more than N steps while Y runs N+1 steps before taking action), yet it’s clear that Y loses.

I think it’s because “lose in n steps” is not clear.


by Vladimir Slepnev 269 days ago | Stuart Armstrong likes this | link

It doesn’t mean computation steps. Losing in 1 step means you say “no” to chocolate, losing in 2 steps means you accept some program that says “no” to chocolate, and so on. Sorry, I thought that was the obvious interpretation, I’ll edit the post to make it clear.


by Stuart Armstrong 269 days ago | link

Ah, thanks! That seems more sensible.


Not sure what your argument is. Can you develop it?


by Paul Christiano 275 days ago | link

I expect a workable approach will define the operator implicitly as “that thing which has control over the input channel” rather than by giving an explicit definition. This is analogous to the way in which a sail causes your boat to move with the wind: you don’t have to define or measure the wind precisely, you just have to be easily pushed around by it.


by Stuart Armstrong 269 days ago | link

Thus anything that can control the operator becomes defined as the operator? That doesn’t seem safe…


by Paul Christiano 268 days ago | link

The AI defers to anything that can control the operator.

If the operator has physical control over the AI, than any process which controls the operator can replace the AI wholesale. It feels fine to defer to such processes, and certainly it seems much better than the situation where the operator is attempting to correct the AI’s behavior but the AI is paternalistically unresponsive.

Presumably the operator will try to secure themselves in the same way that they try to secure their AI.


by Stuart Armstrong 268 days ago | link

This also means that if the AI can figure out a way of controlling the controller, then it is itself in control form the moment it comes up with a reasonable plan?


by Paul Christiano 267 days ago | link

The AI replacing the operator is certainly a fixed point.

This doesn’t seem any different from the usual situation. Modifying your goals is always a fixed point. That doesn’t mean that our agents will inevitably do it.

An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants.


by Stuart Armstrong 266 days ago | link

An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants.

I disagree (though we may be interpreting that sentence differently). Once the AI has the possibility of subverting the controller, then it is, in effect, in physical control of itself. So it itself becomes the “formal operator”, and, depending on how it’s motivated, is perfectly willing to replace the “human operator”, whose wishes are now irrelevant (because it’s no longer the formal operator).

And this never involves any goal modification at all - it’s the same goal, except that the change in control has changed the definition of the operator.


It is ‘a preference for preferences’; eg “my long term needs take precedence over my short term desires” is a meta-preference (in fact the use of terms ‘needs’ vs ‘desires’ is itself a meta-preference, as at the lowest formal level, both are just preferences).


How about using some conception of “coalition-stable”? In which an option has that property if there is no sub-coalition of players that can unilaterally increase their utility, whatever all the other players choose to do.


by Paul Christiano 285 days ago | Stuart Armstrong likes this | link

This is basically the core, though it’s usually defined for cooperative games, where (a) utility is transferrable and (b) adding a new player never makes an existing player worse off. It’s easy to generalize though.


Can you use the maximum “swing” from the other player’s decisions to get a non-binary measure of dependency?

It’s similar to my idea here.

Then the dependency of player \(P_i\) on \(P_j\) would be \(\max_{a_j} U_i - \min_{a_j} U_i\), where \(a_j\) are the actions of \(P_j\).


by Stuart Armstrong 297 days ago | Abram Demski likes this | link | parent | on: Futarchy Fix

Now, whether a perfect market should pick up an existential risk signature is different from whether a real market would. The behaviour of the Dow around the Cuban missile crisis isn’t encouraging in that regards:







If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes


Privacy & Terms