Intelligent Agent Foundations Forumsign up / log in
Online Learning 1: Bias-detecting online learners
post by Ryan Carey 8 days ago | Vadim Kosoy and Jessica Taylor like this | 3 comments

Note: This describes an idea of Jessica Taylor’s, and is the first of several posts about aspects of online learning.

continue reading »
Index of some decision theory posts
discussion post by Tsvi Benson-Tilsen 8 days ago | Ryan Carey, Jack Gallagher, Jessica Taylor and Scott Garrabrant like this | discuss
Logical inductor limits are dense under pointwise convergence
post by Sam Eisenstat 9 days ago | Abram Demski, Patrick LaVictoire, Scott Garrabrant and Tsvi Benson-Tilsen like this | discuss

Logical inductors [1] are very complex objects, and even their limits are hard to get a handle on. In this post, I investigate the topological properties of the set of all limits of logical inductors.

continue reading »
The set of Logical Inductors is not Convex
post by Scott Garrabrant 18 days ago | Sam Eisenstat, Abram Demski and Patrick LaVictoire like this | 1 comment

Sam Eisenstat asked the following interesting question: Given two logical inductors over the same deductive process, is every (rational) convex combination of them also a logical inductor? Surprisingly, the answer is no! Here is my counterexample.

continue reading »
Logical Inductors contain Logical Inductors over other complexity classes
post by Scott Garrabrant 18 days ago | Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss

In the Logical Induction paper, we give a definition of logical inductors over polynomial time traders. It is clear from our definition that our use of polynomial time is rather arbitrary, and we could define e.g. an exponential time logical inductor. However, it may be less clear that actually logical inductors over one complexity class contain logical inductors over other complexity classes within them.

continue reading »
Learning doesn't solve philosophy of ethics
discussion post by Stuart Armstrong 19 days ago | discuss
Model of human (ir)rationality
post by Stuart Armstrong 19 days ago | discuss

A putative new idea for AI control; index here.

This post is just an initial foray into modelling human irrationality, for the purpose of successful value learning. Its purpose is not to be full model, but have enough details that various common situations can be successfully modelled. The important thing is to model humans in ways that humans can understand (as it’s our definition which determines what’s a bias and what’s a preference in humans).

continue reading »
Heroin model: AI "manipulates" "unmanipulatable" reward
post by Stuart Armstrong 23 days ago | 9 comments

A putative new idea for AI control; index here.

A conversation with Jessica has revealed that people weren’t understanding my points about AI manipulating the learning process. So here’s a formal model of a CIRL-style AI, with a prior over human preferences that treats them as an unchangeable historical fact, yet will manipulate human preferences in practice.

continue reading »
Logical Inductors that trust their limits
post by Scott Garrabrant 24 days ago | Jack Gallagher, Jessica Taylor and Patrick LaVictoire like this | 2 comments

Here is another open question related to Logical Inductors. I have not thought about it very long, so it might be easy.

Does there exist a logical inductor \(\{\mathbb P_n\}\) over PA such that for all \(\phi\):

  1. PA proves that \(\mathbb P_\infty(\phi)\) exists and is in \([0,1]\), and

  2. \(\mathbb{E}_n(\mathbb{P}_\infty(\phi))\eqsim_n\mathbb{P}_n(\phi)\)?

continue reading »
Stratified learning and action
post by Stuart Armstrong 30 days ago | discuss

A putative new idea for AI control; index here.

continue reading »
(C)IRL is not solely a learning process
post by Stuart Armstrong 30 days ago | 29 comments

A putative new idea for AI control; index here.

I feel Inverse Reinforcement Learning (IRL) and Cooperative Inverse Reinforcement Learning (CIRL) are very good ideas, and will likely be essential for safe AI if we can’t come up with some sort of sustainable low impact, modular, or Oracle design. But IRL and CIRL have a weakness. In a nutshell:

  1. The models (C)IRL uses for humans are underspecified.
  2. This should cause CIRL to have motivated and manipulative learning.
  3. Even without that, (C)IRL can end up fitting a terrible model to humans.
  4. To solve those issues, (C)IRL will need to make creative modelling decisions that go beyond (standard) learning.
continue reading »
Learning values versus learning knowledge
discussion post by Stuart Armstrong 30 days ago | 5 comments
Universal Inductors
post by Scott Garrabrant 31 days ago | Sam Eisenstat, Jack Gallagher, Benja Fallenstein, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss

Now that the Logical Induction paper is out, I am directing my attention towards decision theory. The approach I currently think will be most fruitful is attempting to make a logically updateless version of Wei Dai’s Updateless Decision Theory. Abram Demski has posted on here about this, but I think Logical Induction provides a new angle with which we can attack the problem. This post will present an alternate way of viewing Logical Induction which I think will be especially helpful for building a logical UDT. (The Logical Induction paper is a prerequisite for this post.)

continue reading »
IRL is hard
post by Vadim Kosoy 31 days ago | 6 comments

We show that assuming the existence of public-key cryptography, there is an environment in which Inverse Reinforcement Learning is computationally intractable, even though the “teacher” agent, the environment and the utility functions are computable in polynomial-time and there is only 1 bit of information to learn.

continue reading »
Oracle design as de-black-boxer.
discussion post by Stuart Armstrong 42 days ago | discuss
Causal graphs and counterfactuals
discussion post by Stuart Armstrong 45 days ago | 2 comments
Simplified explanation of stratification
post by Stuart Armstrong 45 days ago | Patrick LaVictoire likes this | 5 comments

A putative new idea for AI control; index here.

I’ve previously talked about stratified indifference/learning. In this short post, I’ll try and present the idea, as simply and clearly as possible.

continue reading »
Corrigibility through stratified indifference and learning
discussion post by Stuart Armstrong 56 days ago | 9 comments
Modeling the capabilities of advanced AI systems as episodic reinforcement learning
post by Jessica Taylor 57 days ago | Patrick LaVictoire likes this | 6 comments

Here I’ll summarize the main abstraction I use for thinking about future AI systems. This is essentially the same model that Paul uses. I’m not actually introducing any new ideas in this post; mostly this is intended to summarize my current views.

continue reading »
Can we hybridize Absent-Minded Driver with Death in Damascus?
discussion post by Eliezer Yudkowsky 74 days ago | Patrick LaVictoire likes this | 1 comment
Learning (meta-)preferences
post by Stuart Armstrong 79 days ago | Patrick LaVictoire likes this | 2 comments

A putative new idea for AI control; index here.

There are various methods, such as Cooperative Inverse Reinforcement Learning (CIRL), that aim to have an AI deduce human preferences in some fashion.

The problem is that humans are not rational - citation certainly not needed. But, worse than that, they are not rational in ways that seriously complicate the task of fitting a reward or utility function to them. I presented one problem this entails in a previous post. That talked about the problems that emerged when an AI could influence a human’s preference through the ways it presented the issues.

continue reading »
What does an imperfect agent want?
post by Stuart Armstrong 79 days ago | Patrick LaVictoire likes this | discuss

A putative new idea for AI control; index here.

I’ll roughly divide ways of establishing human preferences into four categories:

  1. Assume true
  2. Best fit
  3. Proxy measures
  4. Modelled irrationality
continue reading »
Three Oracle designs
post by Stuart Armstrong 86 days ago | Patrick LaVictoire likes this | discuss

A putative new idea for AI control; index here.

An initial draft looking at three ways of getting information out of Oracles, information that’s useful and safe - in theory.

One thing I may need to do, is find slightly better names for them ^_^

Good and safe uses of AI Oracles

Abstract:

continue reading »
Abstract model of human bias
post by Stuart Armstrong 101 days ago | 5 comments

A putative new idea for AI control; index here.

Any suggestions for refining this model are welcome!

Somewhat inspired by the previous post, this is a model of human bias that can be used to test theories that want to compute the “true” human preferences. The basic idea is to formalise the question:

  • If the AI can make the human give any answer to any question, can it figure out what humans really want?
continue reading »
When the AI closes a door, it opens a window
post by Stuart Armstrong 101 days ago | discuss

A putative new idea for AI control; index here.

Some methods, such as Cooperative Inverse Reinforcement Learning, have the AI assume that humans have access to a true reward function, that the AI will then attempt to maximise. This post is an attempt to clarify a specific potential problem with these methods; it is related to the third problem described here, but hopefully makes it clearer.

continue reading »
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Can you provide links to the
by Vadim Kosoy on Two Questions about Solomonoff Induction | 0 likes

And I just wanted to write a
by Vadim Kosoy on Online Learning 1: Bias-detecting online learners | 0 likes

Also see the notion of
by Paul Christiano on Online Learning 1: Bias-detecting online learners | 2 likes

Given that this is my first
by Ryan Carey on Online Learning 1: Bias-detecting online learners | 1 like

I initially played around
by Devi Borg on Logical Inductors that trust their limits | 2 likes

I still feel like I don't
by Devi Borg on Logical Inductors that trust their limits | 2 likes

Running the traders on some r
by Sune Kristian Jakobsen on Variations of the Garrabrant-inductor | 0 likes

1. Note that IRL is
by Jessica Taylor on Heroin model: AI "manipulates" "unmanipulatable" r... | 0 likes

Stuart did make it easier for
by Patrick LaVictoire on (C)IRL is not solely a learning process | 0 likes

Nicely done. I should have
by Sam Eisenstat on The set of Logical Inductors is not Convex | 1 like

Ok, I think we need to
by Stuart Armstrong on Heroin model: AI "manipulates" "unmanipulatable" r... | 0 likes

I strongly predict that if
by Jessica Taylor on Heroin model: AI "manipulates" "unmanipulatable" r... | 0 likes

Wireheading the human is the
by Stuart Armstrong on Heroin model: AI "manipulates" "unmanipulatable" r... | 0 likes

Re 1: There are cases where
by Jessica Taylor on Heroin model: AI "manipulates" "unmanipulatable" r... | 0 likes

1. I don't really see the
by Stuart Armstrong on Heroin model: AI "manipulates" "unmanipulatable" r... | 0 likes

RSS

Privacy & Terms (NEW 04/01/15)