Intelligent Agent Foundations Forumsign up / log in
Three Oracle designs
post by Stuart Armstrong 277 days ago | Patrick LaVictoire likes this | discuss

A putative new idea for AI control; index here.

An initial draft looking at three ways of getting information out of Oracles, information that’s useful and safe - in theory.

One thing I may need to do, is find slightly better names for them ^_^

Good and safe uses of AI Oracles

Abstract:


An Oracle is a design for potentially high power artificial intelligences (AIs), where the AI is made safe by restricting it to only answer questions. Unfortunately most designs cause the Oracle to be motivated to manipulate humans with the contents of their answers. The second challenge is to get the AI to provide accurate and useful answers. This paper presents three Oracle designs that get around the manipulation and accuracy problems in different ways: the Counterfactually Unread Agent, the Verified Selective Agent, and the Virtual-world Time-bounded Agent. It demonstrates how each design is safe (given that humans stick with the protocols), and allows different types of questions and answers. Finally, it investigates what happens when the implementation is slightly imperfect, concluding the first two agent designs are robust to this, but not the third.

Images of the three designs:

Counterfactually Unread Agent:

Verified Selective Agent:

Virtual-world Time-bounded Agent:



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

This isn't too related to
by Sam Eisenstat on Generalizing Foundations of Decision Theory II | 0 likes

I also commented there last
by Daniel Dewey on Where's the first benign agent? | 0 likes

(I replied last weekend, but
by Paul Christiano on Where's the first benign agent? | 0 likes

$g$ can be a fiber of $f$,
by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

>It seems like that can be
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I disagree. I'm arguing that
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

But this could happen even if
by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

If I read Paul's post
by Daniel Dewey on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I like this suggestion of a
by Patrick LaVictoire on Proposal for an Implementable Toy Model of Informe... | 0 likes

>It may generalize
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I don't know what you really
by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

>“is trying its best to do
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

In practice, I'd run your
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

>that is able to give
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

> good in practice, but has
by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

RSS

Privacy & Terms