Intelligent Agent Foundations Forumsign up / log in
Goodhart Taxonomy
link by Scott Garrabrant 53 days ago | discuss
Announcing the AI Alignment Prize
link by Vladimir Slepnev 109 days ago | Vadim Kosoy likes this | discuss
Metamathematics and probability
link by Alex Mennen 152 days ago | Abram Demski likes this | discuss
Funding opportunity for AI alignment research
link by Paul Christiano 178 days ago | Vadim Kosoy likes this | 3 comments
Open Problems Regarding Counterfactuals: An Introduction For Beginners
link by Alex Appel 218 days ago | Vadim Kosoy, Tsvi Benson-Tilsen, Vladimir Nesov and Wei Dai like this | 2 comments
Some Criticisms of the Logical Induction paper
link by Tarn Somervell Fletcher 237 days ago | Alex Mennen, Sam Eisenstat and Scott Garrabrant like this | 10 comments
Where's the first benign agent?
link by Jacob Kopczynski 312 days ago | Patrick LaVictoire and Paul Christiano like this | 15 comments
Neural nets designing neural nets
link by Stuart Armstrong 399 days ago | Vadim Kosoy likes this | discuss
The universal prior is malign
link by Paul Christiano 447 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments
(Non-)Interruptibility of Sarsa(λ) and Q-Learning
link by Richard Möhn 462 days ago | Jessica Taylor and Patrick LaVictoire like this | 5 comments
Asymptotic Decision Theory
link by Jack Gallagher 494 days ago | Abram Demski, Jessica Taylor, Patrick LaVictoire, Paul Christiano and Tsvi Benson-Tilsen like this | 2 comments
Variations of the Garrabrant-inductor
link by Sune Kristian Jakobsen 516 days ago | Sam Eisenstat, Abram Demski, Jessica Taylor, Nate Soares and Scott Garrabrant like this | 1 comment
Two Agent Mild Optimization
link by Norman Perlmutter 568 days ago | Abram Demski and Jessica Taylor like this | discuss
A Layman's Explanation of "Safely Interruptible Agents"
link by Zach Weems 569 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss
Improbable Oversight, An Attempt at Informed Oversight
link by William Saunders 578 days ago | Jessica Taylor and Patrick LaVictoire like this | 8 comments
A new proposal for logical counterfactuals
link by Jack Gallagher 593 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 3 comments
An Alternative Setting for Resource-Bounded Lob's Theorem
link by Siddharth Bhaskar 595 days ago | Patrick LaVictoire and Scott Garrabrant like this | discuss
Working on a series of safety environments for OpenAI gym. Would love comments and ideas.
link by Rafael Cosman 619 days ago | Daniel Dewey, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss
every function can be computable
link by Ramana Kumar 654 days ago | Patrick LaVictoire likes this | discuss
Goal completion prior art: feature construction
link by Stuart Armstrong 680 days ago | discuss
An approach to the Agent Simulates Predictor problem
link by Alex Mennen 683 days ago | Vadim Kosoy, Abram Demski, Gary Drescher, Jessica Taylor and Patrick LaVictoire like this | 11 comments
Analysis of Algorithms and Partial Algorithms
link by Andrew MacFie 748 days ago | Patrick LaVictoire and Scott Garrabrant like this | 3 comments
Another toy model of the control problem
link by Paul Christiano 753 days ago | Jessica Taylor likes this | discuss
My current take on logical uncertainty
link by Paul Christiano 753 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss
Some work on connecting UDT and Reinforcement Learning
link by David Krueger 796 days ago | Patrick LaVictoire and Paul Christiano like this | 5 comments
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

RSS

Privacy & Terms