by Vladimir Slepnev 383 days ago | link | parent | on: Meta: IAFF vs LessWrong I’ve kinda switched to the view that we should have the whole party happening on LW, so people from different backgrounds can mix and get exposed to each other’s interests. But yeah, it would be nice if new LW had tags, like old LW. reply
 by Vladimir Slepnev 701 days ago | link | parent | on: Autopoietic systems and difficulty of AGI alignmen... figure out what my values actually are / should be I think many human ideas are like low resolution pictures. Sometimes they show simple things, like a circle, so we can make a higher resolution picture of the same circle. That’s known as formalizing an idea. But if the thing in the picture looks complicated, figuring out a higher resolution picture of it is an underspecified problem. I fear that figuring out my values over all possible futures might be that kind of problem. So apart from hoping to define a “full resolution picture” of human values, either by ourselves or with the help of some AI or AI-human hybrid, it might be useful to come up with approaches that avoid defining it. That was my motivation for this post, which directly uses our “low resolution” ideas to describe some particular nice future without considering all possible ones. It’s certainly flawed, but there might be other similar ideas. Does that make sense? reply
 by Wei Dai 698 days ago | link I think I understand what you’re saying, but my state of uncertainty is such that I put a lot of probability mass on possibilities that wouldn’t be well served by what you’re suggesting. For example, the possibility that we can achieve most value not through the consequences of our actions in this universe, but through their consequences in much larger (computationally richer) universes simulating this one. Or that spreading hedonium is actually the right thing to do and produces orders of magnitude more value than spreading anything that resembles human civilization. Or that value scales non-linearly with brain size so we should go for either very large or very small brains. While discussing the VR utopia post, you wrote “I know you want to use philosophy to extend the domain, but I don’t trust our philosophical abilities to do that, because whatever mechanism created them could only test them on normal situations.” I have some hope that there is a minimal set of philosophical abilities that would allow us to eventually solve arbitrary philosophical problems, and we already have this. Otherwise it seems hard to explain the kinds of philosophical progress we’ve made, like realizing that other universes probably exist, and figuring out some ideas about how to make decisions when there are multiple copies of us in this universe and others. Of course it’s also possible that’s not the case, and we can’t do better than to optimize the future using our current “low resolution” values, but until we’re a lot more certain of this, any attempt to do this seems to constitute a strong existential risk. reply
 by Vladimir Slepnev 745 days ago | Patrick LaVictoire likes this | link | parent | on: Some Criticisms of the Logical Induction paper Yeah. An asymptotic thing like Solomonoff induction can still have all sorts of mathematical goodness, like multiple surprisingly equivalent definitions, uniqueness/optimality properties, etc. It doesn’t have to be immediately practical to be worth studying. I hope LI can also end up like that. reply
 by Vladimir Slepnev 746 days ago | Alex Mennen likes this | link | parent | on: A cheating approach to the tiling agents problem I just realized that A will not only approve itself as successor, but also approve some limited self-modifications, like removing some inefficiency in choosing B that provably doesn’t affect the choice of B. Though it doesn’t matter much, because A might as well delete all code for choosing B and appoint a quining B as successor. This suggests that the next version of the tiling agents problem should involve nontrivial self-improvement, not just self-reproduction. I have no idea how to formalize that though. reply
 by Vladimir Slepnev 751 days ago | Vanessa Kosoy likes this | link | parent | on: Loebian cooperation in the tiling agents problem Wei has proposed this program by email: Always say “yes” to chocolate. Accept Y as successor iff PA proves that Y always says “yes” to chocolate and Y accepts a subset of successors that X accepts. Figuring out the relationship between 5 and 6 (do they accept each other, are they equivalent) is a really fun exercise. So far I’ve been able to find a program that’s accepted by 5 but not 6. Won’t spoil people the pleasure of figuring it out :-) reply
 by Vladimir Slepnev 754 days ago | Stuart Armstrong likes this | link | parent | on: Loebian cooperation in the tiling agents problem It doesn’t mean computation steps. Losing in 1 step means you say “no” to chocolate, losing in 2 steps means you accept some program that says “no” to chocolate, and so on. Sorry, I thought that was the obvious interpretation, I’ll edit the post to make it clear. reply
 by Stuart Armstrong 754 days ago | link Ah, thanks! That seems more sensible. reply
 by Vladimir Slepnev 812 days ago | Sam Eisenstat and Abram Demski like this | link | parent | on: Two Major Obstacles for Logical Inductor Decision ... I’ll just note that in a modal logic or halting oracle setting you don’t need the chicken rule, as we found in this old post: https://agentfoundations.org/item?id=4 So it seems like at least the first problem is about the approximation, not the thing being approximated. reply
 by Sam Eisenstat 771 days ago | Abram Demski likes this | link Yeah, the 5 and 10 problem in the post actually can be addressed using provability ideas, in a way that fits in pretty natually with logical induction. The motivation here is to work with decision problems where you can’t prove statements $$A = a \to U = u$$ for agent $$A$$, utility function $$U$$, action $$a$$, and utility value $$u$$, at least not with the amount of computing power provided, but you want to use inductive generalizations instead. That isn’t necessary in this example, so it’s more of an illustration. To say a bit more, if you make logical inductors propositionally consistent, similarly to what is done in this post, and make them assign things that have been proven already probability 1, then they will work on the 5 and 10 problem in the post. It would be interesting if there was more of an analogy to explore between the provability oracle setting and the inductive setting, and more ideas could be carried over from modal UDT, but it seems to me that this is a different kind of problem that will require new ideas. reply
 by Vladimir Slepnev 1188 days ago | link | parent | on: The many counterfactuals of counterfactual mugging Counterfactual mugging with a logical coin is a tricky problem. It might be easier to describe the problem with a “physical” coin first. We have two world programs, mutually quined: The agent decides whether to pay the predictor 10 dollars. The predictor doesn’t decide anything. The agent doesn’t decide anything. The predictor decides whether to pay the agent 100 dollars, depending on the agent’s decision in world 1. By fiat, the agent cares about the two worlds equally, i.e. it maximizes the total sum of money it receives in both worlds. The usual UDT-ish solution can be crisply formulated in modal logic, PA or a bunch of other formalisms. Does that make sense? reply
 by Scott Garrabrant 1188 days ago | Ryan Carey likes this | link This makes sense. My main point is that the care about the two worlds equally part makes sense if it is part of the problem description, but otherwise we don’t know where that part comes from. My logical example was supposed to illustrate that sometimes you should not care about them equally. reply
 by Vladimir Slepnev 1236 days ago | Abram Demski likes this | link | parent | on: Naturalistic Logical Updates Nice work! I’m not too attached to the particular prior described in the “UDT 1.5” post, and I’m very happy that you’re trying to design something better to solve the same problem. That said, it’s a bit suspicious that you’re conditioning on provability rather than truth, which leads you to use GLS etc. Maybe I just don’t understand the “non-naturalistic” objection? Also I don’t understand why you have to use both propositional coherence and partitions of truth. Can’t you replace partitions of truth with arbitrary propositional combinations of sentences? reply
 by Abram Demski 1198 days ago | link The main reason to think it’s important to be “naturalistic” about the updates is reflective consistency. When there is negligible interaction between possible worlds, an updateless agent should behave like an updateful one. The updateless version of the agent would not endorse updating on $$\phi$$ rather than $$\Box \phi$$! Since it does not trust what its future self proves, it would self-modify to remove such an update if it were hard-wired. So I think updating on provability rather than truth is really the right thing. reply
 by Vladimir Slepnev 1328 days ago | link | parent | on: Two Questions about Solomonoff Induction Well, Laplace’s rule of succession is a computable prior that will almost certainly converge to your uncomputable probability value, and I think the difference in log scores from the “true” prior will be finite too. Since Laplace’s rule is included in the Solomonoff mixture, I suspect that things should work out nicely. I don’t have a proof, though. reply
 Older

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes