Intelligent Agent Foundations Forumsign up / log in
by Jessica Taylor 346 days ago | link | parent

“We should work on understanding principles of intelligence so that we can make sure that AIs are thinking the same way as humans do; currently we lack this level of understanding”

Roughly. I think the minimax algorithm would qualify as “something that thinks the same way an idealized human would”, where “idealized” is doing substantial work (certainly, humans don’t actually play chess using minimax).

I don’t really understand point 10, especially this part:

Consider the following procedure for building an AI:

  1. Collect a collection of AI tasks that we think are AGI-complete (e.g. a bunch of games and ML tasks)
  2. Search for a short program that takes lots of data from the Internet as input and produces a policy that does well on lots of these AI tasks
  3. Run this program on substantially different tasks related to the real world

This seems very likely to result in an unaligned AI. Consider the following program:

  1. Simulate some stochastic physics, except that there’s some I/O terminal somewhere (as described in this post)
  2. If the I/O terminal gets used, give the I/O terminal the Internet data as input and take the policy as output
  3. If it doesn’t get used, run the simulation again until it does

This program is pretty short, and with some non-negligible probability (say, more than 1 in 1 billion), it’s going to produce a policy that is an unaligned AGI. This is because in enough runs of physics there will be civilizations; if the I/O terminal is accessed it is probably by some civilization; and the civilization will probably have values that are not aligned with human values, so they will do a treacherous turn (if they have enough information to know how the I/O terminal is being interpreted, which they do if there’s a lot of Internet data).



by David Krueger 345 days ago | link

Thanks, I think I understand that part of the argument now. But I don’t understand how it relates to:

“10. We should expect simple reasoning rules to correctly generalize even for non-learning problems.”

^Is that supposed to be a good thing or a bad thing? “Should expect” as in we want to find rules that do this, or as in rules will probably do this?

reply

by Jessica Taylor 345 days ago | link

It’s just meant to be a prediction (simple rules will probably generalize).

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

RSS

Privacy & Terms