by Stuart Armstrong 189 days ago | link | parent | on: Being legible to other agents by committing to usi... This is exactly the sort of thing I’ve wanted for ASP (Agent Simulates Predictor). One problem that’s always blocked me, is how to know when to do this, rather than using it add-hoc - is there an easy way to know that there’s an agent out in the universe using a more limited reasoning system? reply
 by Stuart Armstrong 204 days ago | link | parent | on: Policy Selection Solves Most Problems policy selection converges to giving Omega the money so long as the difficulty of computing the coin exceeds the power of the market at $$f(n)$$ time. Would it be sensible to just look for muggings (and ASPs) at the very beginning of the process, and then decide immediately what to do as soon as one is detected? Come to think of that, precommitting to ignoring knowledge about the result of the coin seems to be the best strategy here; does this cash out into anything useful in this formalism? reply
 by Abram Demski 202 days ago | link Looking “at the very beginning” won’t work – the beliefs of the initial state of the logical inductor won’t be good enough to sensibly detect these things and decide what to do about them. While ignoring the coin is OK as special-case reasoning, I don’t think everything falls nicely into the bucket of “information you want to ignore” vs “information you want to update on”. The more general concept which captures both is to ask “how do I want to react to thin information, in terms of my action?” – which is of course the idea of policy selection. reply
 by Stuart Armstrong 236 days ago | link | parent | on: Predictable Exploration If the other players can see what action you’ll take, then they may simply exploit you. Isn’t this a variant of the “agent simulates predictor” problem (with you playing the role of the predictor)? Thus any agent capable of exploiting you has to prove to you that it won’t, in order to get anything from you. That’s kind of what happens with your Nicerbots; even if perfectly predictable, they’re not really exploitable in any strong sense (they won’t cooperate with a defector). reply
 by Abram Demski 235 days ago | link I think the point I was making here was a bit less clear than I wanted it to be. I was saying that, if you use predictable exploration on actions rather than policies, then you only get to see what happens when you predictably take a certain action. This is good for learning pure equilibria in games, but doesn’t give information which would help the agent reach the right mixed equilibria when randomized actions should be preferred; and indeed, it doesn’t seem like such an agent would reach the right mixed equilibria. I believe the “predictable exploration on policies” approach solves agent-simulates-predictor just fine, along with other problems (including counterfactual mugging) which require “some degree of updatelessness” without requiring the full reflective stability which we want from updatelessness. reply
 by Stuart Armstrong 257 days ago | link | parent | on: Hyperreal Brouwer To quote the straw vulcan: Fascinating. reply
 by Stuart Armstrong 358 days ago | link | parent | on: Loebian cooperation in the tiling agents problem I’m not sure this would work, and it might be tied to ambiguity about what “steps” mean. Consider: Y: Run X to completion. Then say “no” to chocolate. Then PA proves that Y doesn’t lose in less steps than X (since X doesn’t do anything in more than N steps while Y runs N+1 steps before taking action), yet it’s clear that Y loses. I think it’s because “lose in n steps” is not clear. reply
 by Vladimir Slepnev 358 days ago | Stuart Armstrong likes this | link It doesn’t mean computation steps. Losing in 1 step means you say “no” to chocolate, losing in 2 steps means you accept some program that says “no” to chocolate, and so on. Sorry, I thought that was the obvious interpretation, I’ll edit the post to make it clear. reply
 by Stuart Armstrong 358 days ago | link Ah, thanks! That seems more sensible. reply
 by Stuart Armstrong 365 days ago | link | parent | on: Corrigibility thoughts II: the robot operator Not sure what your argument is. Can you develop it? reply
 by Paul Christiano 364 days ago | link I expect a workable approach will define the operator implicitly as “that thing which has control over the input channel” rather than by giving an explicit definition. This is analogous to the way in which a sail causes your boat to move with the wind: you don’t have to define or measure the wind precisely, you just have to be easily pushed around by it. reply
 by Stuart Armstrong 358 days ago | link Thus anything that can control the operator becomes defined as the operator? That doesn’t seem safe… reply
 by Paul Christiano 357 days ago | link The AI defers to anything that can control the operator. If the operator has physical control over the AI, than any process which controls the operator can replace the AI wholesale. It feels fine to defer to such processes, and certainly it seems much better than the situation where the operator is attempting to correct the AI’s behavior but the AI is paternalistically unresponsive. Presumably the operator will try to secure themselves in the same way that they try to secure their AI. reply
 by Stuart Armstrong 356 days ago | link This also means that if the AI can figure out a way of controlling the controller, then it is itself in control form the moment it comes up with a reasonable plan? reply
 by Paul Christiano 356 days ago | link The AI replacing the operator is certainly a fixed point. This doesn’t seem any different from the usual situation. Modifying your goals is always a fixed point. That doesn’t mean that our agents will inevitably do it. An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants. reply
 by Stuart Armstrong 355 days ago | link An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants. I disagree (though we may be interpreting that sentence differently). Once the AI has the possibility of subverting the controller, then it is, in effect, in physical control of itself. So it itself becomes the “formal operator”, and, depending on how it’s motivated, is perfectly willing to replace the “human operator”, whose wishes are now irrelevant (because it’s no longer the formal operator). And this never involves any goal modification at all - it’s the same goal, except that the change in control has changed the definition of the operator. reply
 by Stuart Armstrong 365 days ago | link | parent | on: Humans are not agents: short vs long term It is ‘a preference for preferences’; eg “my long term needs take precedence over my short term desires” is a meta-preference (in fact the use of terms ‘needs’ vs ‘desires’ is itself a meta-preference, as at the lowest formal level, both are just preferences). reply
 by Stuart Armstrong 377 days ago | link | parent | on: Cooperative Oracles: Stratified Pareto Optima and ... How about using some conception of “coalition-stable”? In which an option has that property if there is no sub-coalition of players that can unilaterally increase their utility, whatever all the other players choose to do. reply
 by Paul Christiano 374 days ago | Stuart Armstrong likes this | link This is basically the core, though it’s usually defined for cooperative games, where (a) utility is transferrable and (b) adding a new player never makes an existing player worse off. It’s easy to generalize though. reply
 by Stuart Armstrong 377 days ago | link | parent | on: Cooperative Oracles: Stratified Pareto Optima and ... Can you use the maximum “swing” from the other player’s decisions to get a non-binary measure of dependency? It’s similar to my idea here. Then the dependency of player $$P_i$$ on $$P_j$$ would be $$\max_{a_j} U_i - \min_{a_j} U_i$$, where $$a_j$$ are the actions of $$P_j$$. reply
 by Stuart Armstrong 386 days ago | Abram Demski likes this | link | parent | on: Futarchy Fix Now, whether a perfect market should pick up an existential risk signature is different from whether a real market would. The behaviour of the Dow around the Cuban missile crisis isn’t encouraging in that regards: reply
 Older

### NEW DISCUSSION POSTS

I found an improved version
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
 by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
 by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
 by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
 by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes