Intelligent Agent Foundations Forumsign up / log in
1.Funding for independent AI alignment research
link by Paul Christiano 326 days ago | discuss
2.Funding opportunity for AI alignment research
link by Paul Christiano 515 days ago | Vadim Kosoy likes this | 3 comments
3.The universal prior is malign
link by Paul Christiano 784 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments
4.My recent posts
discussion post by Paul Christiano 785 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire, Stuart Armstrong and Tsvi Benson-Tilsen like this | discuss
5.Control and security
post by Paul Christiano 830 days ago | Ryan Carey, Jessica Taylor and Vladimir Nesov like this | 7 comments

I used to think of AI security as largely unrelated to AI control, and my impression is that some people on this forum probably still do. I’ve recently shifted towards seeing control and security as basically the same, and thinking that security may often be a more appealing way to think and talk about control.

continue reading »
6.Time hierarchy theorems for distributional estimation problems
discussion post by Paul Christiano 1008 days ago | Vadim Kosoy and Patrick LaVictoire like this | 6 comments
7.Another toy model of the control problem
link by Paul Christiano 1090 days ago | Jessica Taylor likes this | discuss
8.My current take on logical uncertainty
link by Paul Christiano 1090 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss
9.Active learning for opaque predictors
post by Paul Christiano 1116 days ago | Jessica Taylor likes this | discuss

Summary: I propose a simple open question that is directly relevant to the feasibility of my recent AI control proposals.

continue reading »
10.The steering problem
post by Paul Christiano 1487 days ago | Abram Demski likes this | discuss

Most work on AI safety starts with a broad, vague problem (“How can we make an AI do good things?”) and relatively quickly moves to a narrow, precise problem (e.g. “What kind of reasoning process trusts itself?“).

Precision facilitates progress, and many serious thinkers are skeptical of imprecision. But in narrowing the problem too far we do most of the work (and have most of the opportunity for error).

I am interested in more precise discussion of the big-picture problem of AI control. Such discussion could improve our understanding of AI control, help us choose the right narrow questions, and be a better starting point for engaging other researchers. To that end, consider the following problem:

The steering problem: Using black-box access to human-level cognitive abilities, can we write a program that is as useful as a well-motivated human with those abilities?

I recently wrote this document, which defines this problem much more precisely (in section 2) and considers a few possible approaches (in section 4). As usual, I appreciate thoughts and criticism. I apologize for the proliferation of nomenclature, but I couldn’t get by without a new name.

continue reading »
11.Model-free decisions
post by Paul Christiano 1513 days ago | Daniel Dewey, Jessica Taylor, Nate Soares and Stuart Armstrong like this | 3 comments

Much concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes by directly specifying what actions are good. I flesh out one possible alternative here.

(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments. Big-picture discussion should probably stay here.)

12.Stable self-improvement as a research problem
post by Paul Christiano 1530 days ago | Abram Demski, Benja Fallenstein, Daniel Dewey, Nate Soares and Stuart Armstrong like this | 6 comments

“Stable self-improvement” seems to be a primary focus of MIRI’s work. As I understand it, the problem is “How do we build an agent which rationally pursues some goal, is willing to modify itself, and with very high probability continues to pursue the same goal after modification?”

The key difficulty is that it is impossible for an agent to formally “trust” its own reasoning, i.e. to believe that “anything that I believe is true.” Indeed, even the natural concept of “truth” is logically problematic. But without such a notion of trust, why should an agent even believe that its own continued existence is valuable?

I agree that there are open philosophical questions concerning reasoning under logical uncertainty, and that reflective reasoning highlights some of the difficulties. But I am not yet convinced that stable self-improvement as an especially important problem; I think it would be handled correctly by a human-level reasoner as a special case of decision-making under logical uncertainty. This suggests that (1) it will probably be resolved en route to human-level AI, (2) it can probably be “safely” delegated to a human-level AI. I would prefer for energy to be used on other aspects of the AI safety problem.

continue reading »

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms