Intelligent Agent Foundations Forumsign up / log in
Using lying to detect human values
link by Stuart Armstrong 99 days ago | discuss
Intuitive examples of reward function learning?
link by Stuart Armstrong 108 days ago | discuss
Funding for independent AI alignment research
link by Paul Christiano 111 days ago | discuss
Beyond algorithmic equivalence: self-modelling
link by Stuart Armstrong 113 days ago | discuss
Beyond algorithmic equivalence: algorithmic noise
link by Stuart Armstrong 113 days ago | discuss
Goodhart Taxonomy
link by Scott Garrabrant 174 days ago | discuss
Announcing the AI Alignment Prize
link by Vladimir Slepnev 230 days ago | Vadim Kosoy likes this | discuss
Metamathematics and probability
link by Alex Mennen 273 days ago | Abram Demski likes this | discuss
Funding opportunity for AI alignment research
link by Paul Christiano 299 days ago | Vadim Kosoy likes this | 3 comments
Open Problems Regarding Counterfactuals: An Introduction For Beginners
link by Alex Appel 340 days ago | Vadim Kosoy, Tsvi Benson-Tilsen, Vladimir Nesov and Wei Dai like this | 2 comments
Some Criticisms of the Logical Induction paper
link by Tarn Somervell Fletcher 359 days ago | Alex Mennen, Sam Eisenstat and Scott Garrabrant like this | 10 comments
Where's the first benign agent?
link by Jacob Kopczynski 433 days ago | Patrick LaVictoire and Paul Christiano like this | 15 comments
Neural nets designing neural nets
link by Stuart Armstrong 520 days ago | Vadim Kosoy likes this | discuss
The universal prior is malign
link by Paul Christiano 569 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments
(Non-)Interruptibility of Sarsa(λ) and Q-Learning
link by Richard Möhn 583 days ago | Jessica Taylor and Patrick LaVictoire like this | 5 comments
Asymptotic Decision Theory
link by Jack Gallagher 615 days ago | Abram Demski, Jessica Taylor, Patrick LaVictoire, Paul Christiano and Tsvi Benson-Tilsen like this | 2 comments
Variations of the Garrabrant-inductor
link by Sune Kristian Jakobsen 637 days ago | Sam Eisenstat, Abram Demski, Jessica Taylor, Nate Soares and Scott Garrabrant like this | 1 comment
Two Agent Mild Optimization
link by Norman Perlmutter 690 days ago | Abram Demski and Jessica Taylor like this | discuss
A Layman's Explanation of "Safely Interruptible Agents"
link by Zach Weems 691 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss
Improbable Oversight, An Attempt at Informed Oversight
link by William Saunders 699 days ago | Jessica Taylor and Patrick LaVictoire like this | 8 comments
A new proposal for logical counterfactuals
link by Jack Gallagher 715 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 3 comments
An Alternative Setting for Resource-Bounded Lob's Theorem
link by Siddharth Bhaskar 716 days ago | Patrick LaVictoire and Scott Garrabrant like this | discuss
Working on a series of safety environments for OpenAI gym. Would love comments and ideas.
link by Rafael Cosman 740 days ago | Daniel Dewey, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss
every function can be computable
link by Ramana Kumar 775 days ago | Patrick LaVictoire likes this | discuss
Goal completion prior art: feature construction
link by Stuart Armstrong 801 days ago | discuss
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

RSS

Privacy & Terms