Intelligent Agent Foundations Forumsign up / log in
by Jessica Taylor 441 days ago | link | parent

“We should work on understanding principles of intelligence so that we can make sure that AIs are thinking the same way as humans do; currently we lack this level of understanding”

Roughly. I think the minimax algorithm would qualify as “something that thinks the same way an idealized human would”, where “idealized” is doing substantial work (certainly, humans don’t actually play chess using minimax).

I don’t really understand point 10, especially this part:

Consider the following procedure for building an AI:

  1. Collect a collection of AI tasks that we think are AGI-complete (e.g. a bunch of games and ML tasks)
  2. Search for a short program that takes lots of data from the Internet as input and produces a policy that does well on lots of these AI tasks
  3. Run this program on substantially different tasks related to the real world

This seems very likely to result in an unaligned AI. Consider the following program:

  1. Simulate some stochastic physics, except that there’s some I/O terminal somewhere (as described in this post)
  2. If the I/O terminal gets used, give the I/O terminal the Internet data as input and take the policy as output
  3. If it doesn’t get used, run the simulation again until it does

This program is pretty short, and with some non-negligible probability (say, more than 1 in 1 billion), it’s going to produce a policy that is an unaligned AGI. This is because in enough runs of physics there will be civilizations; if the I/O terminal is accessed it is probably by some civilization; and the civilization will probably have values that are not aligned with human values, so they will do a treacherous turn (if they have enough information to know how the I/O terminal is being interpreted, which they do if there’s a lot of Internet data).

by David Krueger 441 days ago | link

Thanks, I think I understand that part of the argument now. But I don’t understand how it relates to:

“10. We should expect simple reasoning rules to correctly generalize even for non-learning problems.”

^Is that supposed to be a good thing or a bad thing? “Should expect” as in we want to find rules that do this, or as in rules will probably do this?


by Jessica Taylor 440 days ago | link

It’s just meant to be a prediction (simple rules will probably generalize).






If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes


Privacy & Terms