Intelligent Agent Foundations Forumsign up / log in

This is exactly the sort of thing I’ve wanted for ASP (Agent Simulates Predictor).

One problem that’s always blocked me, is how to know when to do this, rather than using it add-hoc - is there an easy way to know that there’s an agent out in the universe using a more limited reasoning system?


policy selection converges to giving Omega the money so long as the difficulty of computing the coin exceeds the power of the market at \(f(n)\) time.

Would it be sensible to just look for muggings (and ASPs) at the very beginning of the process, and then decide immediately what to do as soon as one is detected?

Come to think of that, precommitting to ignoring knowledge about the result of the coin seems to be the best strategy here; does this cash out into anything useful in this formalism?


by Abram Demski 202 days ago | link

Looking “at the very beginning” won’t work – the beliefs of the initial state of the logical inductor won’t be good enough to sensibly detect these things and decide what to do about them.

While ignoring the coin is OK as special-case reasoning, I don’t think everything falls nicely into the bucket of “information you want to ignore” vs “information you want to update on”. The more general concept which captures both is to ask “how do I want to react to thin information, in terms of my action?” – which is of course the idea of policy selection.


If the other players can see what action you’ll take, then they may simply exploit you.

Isn’t this a variant of the “agent simulates predictor” problem (with you playing the role of the predictor)? Thus any agent capable of exploiting you has to prove to you that it won’t, in order to get anything from you. That’s kind of what happens with your Nicerbots; even if perfectly predictable, they’re not really exploitable in any strong sense (they won’t cooperate with a defector).


by Abram Demski 235 days ago | link

I think the point I was making here was a bit less clear than I wanted it to be. I was saying that, if you use predictable exploration on actions rather than policies, then you only get to see what happens when you predictably take a certain action. This is good for learning pure equilibria in games, but doesn’t give information which would help the agent reach the right mixed equilibria when randomized actions should be preferred; and indeed, it doesn’t seem like such an agent would reach the right mixed equilibria.

I believe the “predictable exploration on policies” approach solves agent-simulates-predictor just fine, along with other problems (including counterfactual mugging) which require “some degree of updatelessness” without requiring the full reflective stability which we want from updatelessness.


by Stuart Armstrong 257 days ago | link | parent | on: Hyperreal Brouwer

To quote the straw vulcan: Fascinating.


I’m not sure this would work, and it might be tied to ambiguity about what “steps” mean.


Y: Run X to completion. Then say “no” to chocolate.

Then PA proves that Y doesn’t lose in less steps than X (since X doesn’t do anything in more than N steps while Y runs N+1 steps before taking action), yet it’s clear that Y loses.

I think it’s because “lose in n steps” is not clear.


by Vladimir Slepnev 358 days ago | Stuart Armstrong likes this | link

It doesn’t mean computation steps. Losing in 1 step means you say “no” to chocolate, losing in 2 steps means you accept some program that says “no” to chocolate, and so on. Sorry, I thought that was the obvious interpretation, I’ll edit the post to make it clear.


by Stuart Armstrong 358 days ago | link

Ah, thanks! That seems more sensible.


Not sure what your argument is. Can you develop it?


by Paul Christiano 364 days ago | link

I expect a workable approach will define the operator implicitly as “that thing which has control over the input channel” rather than by giving an explicit definition. This is analogous to the way in which a sail causes your boat to move with the wind: you don’t have to define or measure the wind precisely, you just have to be easily pushed around by it.


by Stuart Armstrong 358 days ago | link

Thus anything that can control the operator becomes defined as the operator? That doesn’t seem safe…


by Paul Christiano 357 days ago | link

The AI defers to anything that can control the operator.

If the operator has physical control over the AI, than any process which controls the operator can replace the AI wholesale. It feels fine to defer to such processes, and certainly it seems much better than the situation where the operator is attempting to correct the AI’s behavior but the AI is paternalistically unresponsive.

Presumably the operator will try to secure themselves in the same way that they try to secure their AI.


by Stuart Armstrong 356 days ago | link

This also means that if the AI can figure out a way of controlling the controller, then it is itself in control form the moment it comes up with a reasonable plan?


by Paul Christiano 356 days ago | link

The AI replacing the operator is certainly a fixed point.

This doesn’t seem any different from the usual situation. Modifying your goals is always a fixed point. That doesn’t mean that our agents will inevitably do it.

An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants.


by Stuart Armstrong 355 days ago | link

An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants.

I disagree (though we may be interpreting that sentence differently). Once the AI has the possibility of subverting the controller, then it is, in effect, in physical control of itself. So it itself becomes the “formal operator”, and, depending on how it’s motivated, is perfectly willing to replace the “human operator”, whose wishes are now irrelevant (because it’s no longer the formal operator).

And this never involves any goal modification at all - it’s the same goal, except that the change in control has changed the definition of the operator.


It is ‘a preference for preferences’; eg “my long term needs take precedence over my short term desires” is a meta-preference (in fact the use of terms ‘needs’ vs ‘desires’ is itself a meta-preference, as at the lowest formal level, both are just preferences).


How about using some conception of “coalition-stable”? In which an option has that property if there is no sub-coalition of players that can unilaterally increase their utility, whatever all the other players choose to do.


by Paul Christiano 374 days ago | Stuart Armstrong likes this | link

This is basically the core, though it’s usually defined for cooperative games, where (a) utility is transferrable and (b) adding a new player never makes an existing player worse off. It’s easy to generalize though.


Can you use the maximum “swing” from the other player’s decisions to get a non-binary measure of dependency?

It’s similar to my idea here.

Then the dependency of player \(P_i\) on \(P_j\) would be \(\max_{a_j} U_i - \min_{a_j} U_i\), where \(a_j\) are the actions of \(P_j\).


by Stuart Armstrong 386 days ago | Abram Demski likes this | link | parent | on: Futarchy Fix

Now, whether a perfect market should pick up an existential risk signature is different from whether a real market would. The behaviour of the Dow around the Cuban missile crisis isn’t encouraging in that regards:







I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes


Privacy & Terms