Intelligent Agent Foundations Forumsign up / log in
One weird trick to turn maximisers into minimisers
discussion post by Stuart Armstrong 180 days ago | discuss

A putative new idea for AI control; index here.

A simple and easy design for a \(u\)-maximising agent that turns into a \(u\)-minimising one.

Let \(X\) be some boolean random variable outside the agent’s control, that will be determined at some future time \(t\) (based on a cosmic event, maybe?). Set it up so that \(P(X=1)=\epsilon\), and for a given utility \(u\) consider the utility:

  • \(u^\# = (2/\epsilon)Xu - u\).

Before \(t\), the expected value of \((2/\epsilon)X\) is \(2\), so \(u^\# = u\). Hence the agent is a \(u\)-maximiser. After \(t\), the most likely option is \(X=0\), hence a little bit of evidence to that effect is enough to make \(u^\#\) into a \(u\)-minimiser.

This isn’t perfect corrigibility — the agent would be willing to sacrifice a bit of \(u\)-value (before \(t\)) in order to maintain its flexibility after \(t\). To combat this effect, we could instead use:

  • \(u^\# = \Omega(2/\epsilon)Xu - u\).

If \(\Omega\) is large, then the agent is willing to pay very little \(u\)-value to maintain flexibility. However, the amount of evidence of \(X=0\) that it needs to become a \(u\)-minimiser is equally proportional to \(\Omega\), so \(X\) better be a clear and convincing event.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Typo: The statement of
by Patrick LaVictoire on Asymptotic Decision Theory | 0 likes

> I think that the main
by Vladimir Nesov on Control and security | 0 likes

It seems to me like failures
by Paul Christiano on Control and security | 0 likes

This works as a subtle
by Vladimir Nesov on Control and security | 0 likes

In general finding
by Jessica Taylor on Online Learning 1: Bias-detecting online learners | 0 likes

After reading your post, I
by Jessica Taylor on Control and security | 0 likes

We could also generalize this
by Paul Christiano on Online Learning 1: Bias-detecting online learners | 0 likes

This is cool! It would be
by Paul Christiano on Online Learning 1: Bias-detecting online learners | 0 likes

Can you provide links to the
by Vadim Kosoy on Two Questions about Solomonoff Induction | 0 likes

And I just wanted to write a
by Vadim Kosoy on Online Learning 1: Bias-detecting online learners | 0 likes

Also see the notion of
by Paul Christiano on Online Learning 1: Bias-detecting online learners | 2 likes

Given that this is my first
by Ryan Carey on Online Learning 1: Bias-detecting online learners | 1 like

I initially played around
by Devi Borg on Logical Inductors that trust their limits | 2 likes

I still feel like I don't
by Devi Borg on Logical Inductors that trust their limits | 2 likes

Running the traders on some r
by Sune Kristian Jakobsen on Variations of the Garrabrant-inductor | 0 likes

RSS

Privacy & Terms (NEW 04/01/15)