Intelligent Agent Foundations Forumsign up / log in
by Jessica Taylor 656 days ago | link | parent

“We should work on understanding principles of intelligence so that we can make sure that AIs are thinking the same way as humans do; currently we lack this level of understanding”

Roughly. I think the minimax algorithm would qualify as “something that thinks the same way an idealized human would”, where “idealized” is doing substantial work (certainly, humans don’t actually play chess using minimax).

I don’t really understand point 10, especially this part:

Consider the following procedure for building an AI:

  1. Collect a collection of AI tasks that we think are AGI-complete (e.g. a bunch of games and ML tasks)
  2. Search for a short program that takes lots of data from the Internet as input and produces a policy that does well on lots of these AI tasks
  3. Run this program on substantially different tasks related to the real world

This seems very likely to result in an unaligned AI. Consider the following program:

  1. Simulate some stochastic physics, except that there’s some I/O terminal somewhere (as described in this post)
  2. If the I/O terminal gets used, give the I/O terminal the Internet data as input and take the policy as output
  3. If it doesn’t get used, run the simulation again until it does

This program is pretty short, and with some non-negligible probability (say, more than 1 in 1 billion), it’s going to produce a policy that is an unaligned AGI. This is because in enough runs of physics there will be civilizations; if the I/O terminal is accessed it is probably by some civilization; and the civilization will probably have values that are not aligned with human values, so they will do a treacherous turn (if they have enough information to know how the I/O terminal is being interpreted, which they do if there’s a lot of Internet data).



by David Krueger 655 days ago | link

Thanks, I think I understand that part of the argument now. But I don’t understand how it relates to:

“10. We should expect simple reasoning rules to correctly generalize even for non-learning problems.”

^Is that supposed to be a good thing or a bad thing? “Should expect” as in we want to find rules that do this, or as in rules will probably do this?

reply

by Jessica Taylor 655 days ago | link

It’s just meant to be a prediction (simple rules will probably generalize).

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms