Intelligent Agent Foundations Forumsign up / log in
by Vadim Kosoy 136 days ago | link | parent

I think that we should expect evolution to give us a prior that is a good lossy compression of actual physics (where “actual physics” means, those patterns the universe has that can be described within our computational complexity bounds). Meaning that, on the one hand it should be low description complexity (otherwise it will be hard for evolution to find it), and on the other hand it should be assign high probability to the true environment (in other words, the KL divergence of the true environment from the prior should be small). And also it should be approximately learnable, otherwise it won’t go from assigning high probability to actually performing well.

The principles you outlined seem reasonable overall.

Note that the locality/dissipation/multiagent assumptions amount to a special case of “the environment is effectively reversible (from the perspective of the human species as a whole) as long as you don’t apply too much optimization power” (“optimization power” probably translates to divergence from some baseline policy plus maybe computational complexity considerations). Now, as you noted before, actual macroscopic physics is not reversible, but it might still be effectively reversible if you have a reliable long-term source of negentropy (like the sun). Maybe we can also slightly relax them by allowing irreversible changes as long as they are localized and the available space is sufficiently big.

“If we construct AI systems, we will give them code (including a prior) that we expect to cause them to do something useful for us. In general, the agency of an agent’s creator should affect the agent’s beliefs” is essentially what DRL does: allows transferring our knowledge to the AI without hard-coding it by hand.

“When small changes have big and predictable effects (e.g. in a computer), there is often agentic optimization power towards the creation and maintenance of this system of effects, and in these cases it is possible for at least some agents to understand important things about how the system works” seems like it would allow us to go beyond effective reversibility, but I’m not sure how to formalize it or whether it’s a justified assumption. One way towards formalizing it is, the prior is s.t. studying the initial state approximate communication class allows determining the entire environment, but this seems to point at a very broad class of approximately learnable priors w/o specifying a criterion how to choose among them.

Another principle that we can try to use is, the ubiquity of analytic functions. Analytic functions have the property that, knowing the function in a bounded domain allows extrapolating it everywhere. This is different from allowing arbitrary computable functions which may have “if” clauses, so that studying the function in a bounded domain is never enough to be sure about its behavior outside it. In particular, this line of inquiry seems relatively easy to formalize using continuous MDPs (although we run into the problem that finding the optimal policy is infeasible, in general). Also, it might have something to do with the effectiveness of neural networks (although the popular ReLU response function is not analytic).



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms