Intelligent Agent Foundations Forumsign up / log in
Using lying to detect human values
link by Stuart Armstrong 39 days ago | discuss
Intuitive examples of reward function learning?
link by Stuart Armstrong 47 days ago | discuss
Funding for independent AI alignment research
link by Paul Christiano 50 days ago | discuss
Beyond algorithmic equivalence: self-modelling
link by Stuart Armstrong 53 days ago | discuss
Beyond algorithmic equivalence: algorithmic noise
link by Stuart Armstrong 53 days ago | discuss
Using the universal prior for logical uncertainty
link by Vladimir Slepnev 53 days ago | discuss
Goodhart Taxonomy
link by Scott Garrabrant 113 days ago | discuss
Announcing the AI Alignment Prize
link by Vladimir Slepnev 170 days ago | Vadim Kosoy likes this | discuss
Metamathematics and probability
link by Alex Mennen 213 days ago | Abram Demski likes this | discuss
Funding opportunity for AI alignment research
link by Paul Christiano 239 days ago | Vadim Kosoy likes this | 3 comments
Open Problems Regarding Counterfactuals: An Introduction For Beginners
link by Alex Appel 279 days ago | Vadim Kosoy, Tsvi Benson-Tilsen, Vladimir Nesov and Wei Dai like this | 2 comments
Some Criticisms of the Logical Induction paper
link by Tarn Somervell Fletcher 298 days ago | Alex Mennen, Sam Eisenstat and Scott Garrabrant like this | 10 comments
Where's the first benign agent?
link by Jacob Kopczynski 373 days ago | Patrick LaVictoire and Paul Christiano like this | 15 comments
Neural nets designing neural nets
link by Stuart Armstrong 460 days ago | Vadim Kosoy likes this | discuss
The universal prior is malign
link by Paul Christiano 508 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments
(Non-)Interruptibility of Sarsa(λ) and Q-Learning
link by Richard Möhn 523 days ago | Jessica Taylor and Patrick LaVictoire like this | 5 comments
Asymptotic Decision Theory
link by Jack Gallagher 555 days ago | Abram Demski, Jessica Taylor, Patrick LaVictoire, Paul Christiano and Tsvi Benson-Tilsen like this | 2 comments
Variations of the Garrabrant-inductor
link by Sune Kristian Jakobsen 576 days ago | Sam Eisenstat, Abram Demski, Jessica Taylor, Nate Soares and Scott Garrabrant like this | 1 comment
Two Agent Mild Optimization
link by Norman Perlmutter 629 days ago | Abram Demski and Jessica Taylor like this | discuss
A Layman's Explanation of "Safely Interruptible Agents"
link by Zach Weems 630 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss
Improbable Oversight, An Attempt at Informed Oversight
link by William Saunders 638 days ago | Jessica Taylor and Patrick LaVictoire like this | 8 comments
A new proposal for logical counterfactuals
link by Jack Gallagher 654 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 3 comments
An Alternative Setting for Resource-Bounded Lob's Theorem
link by Siddharth Bhaskar 655 days ago | Patrick LaVictoire and Scott Garrabrant like this | discuss
Working on a series of safety environments for OpenAI gym. Would love comments and ideas.
link by Rafael Cosman 680 days ago | Daniel Dewey, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss
every function can be computable
link by Ramana Kumar 715 days ago | Patrick LaVictoire likes this | discuss
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

RSS

Privacy & Terms