Intelligent Agent Foundations Forumsign up / log in
Three Oracle designs
post by Stuart Armstrong 367 days ago | Patrick LaVictoire likes this | discuss

A putative new idea for AI control; index here.

An initial draft looking at three ways of getting information out of Oracles, information that’s useful and safe - in theory.

One thing I may need to do, is find slightly better names for them ^_^

Good and safe uses of AI Oracles

Abstract:


An Oracle is a design for potentially high power artificial intelligences (AIs), where the AI is made safe by restricting it to only answer questions. Unfortunately most designs cause the Oracle to be motivated to manipulate humans with the contents of their answers. The second challenge is to get the AI to provide accurate and useful answers. This paper presents three Oracle designs that get around the manipulation and accuracy problems in different ways: the Counterfactually Unread Agent, the Verified Selective Agent, and the Virtual-world Time-bounded Agent. It demonstrates how each design is safe (given that humans stick with the protocols), and allows different types of questions and answers. Finally, it investigates what happens when the implementation is slightly imperfect, concluding the first two agent designs are robust to this, but not the third.

Images of the three designs:

Counterfactually Unread Agent:

Verified Selective Agent:

Virtual-world Time-bounded Agent:



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

A few thoughts: I agree
by Sam Eisenstat on Some Criticisms of the Logical Induction paper | 0 likes

Thanks, so to paraphrase your
by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

> Why does Paul think that
by Paul Christiano on Current thoughts on Paul Christano's research agen... | 0 likes

Given that ALBA was not meant
by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

Thank you for writing this.
by Wei Dai on Current thoughts on Paul Christano's research agen... | 1 like

I mostly agree with this
by Paul Christiano on Current thoughts on Paul Christano's research agen... | 2 likes

>From my perspective, I don’t
by Johannes Treutlein on Smoking Lesion Steelman | 2 likes

Replying to Rob. I don't
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Replying to Rob. Actually,
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Replying to 240 (I can't
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Yeah, you're right. This
by Vadim Kosoy on Smoking Lesion Steelman | 1 like

The non-smoke-loving agents
by Abram Demski on Smoking Lesion Steelman | 1 like

Replying to "240" First,
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Clarification: I'm not the
by Tarn Somervell Fletcher on Some Criticisms of the Logical Induction paper | 0 likes

Alex, the difference between
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 1 like

RSS

Privacy & Terms