Intelligent Agent Foundations Forumsign up / log in
Announcing the AI Alignment Prize
link by Vladimir Slepnev 20 days ago | Vadim Kosoy likes this | discuss
Metamathematics and probability
link by Alex Mennen 63 days ago | Abram Demski likes this | discuss
Funding opportunity for AI alignment research
link by Paul Christiano 89 days ago | Vadim Kosoy likes this | 3 comments
Open Problems Regarding Counterfactuals: An Introduction For Beginners
link by Alex Appel 129 days ago | Vadim Kosoy, Tsvi Benson-Tilsen, Vladimir Nesov and Wei Dai like this | 2 comments
Some Criticisms of the Logical Induction paper
link by Tarn Somervell Fletcher 148 days ago | Alex Mennen, Sam Eisenstat and Scott Garrabrant like this | 10 comments
Where's the first benign agent?
link by Jacob Kopczynski 223 days ago | Patrick LaVictoire and Paul Christiano like this | 15 comments
Neural nets designing neural nets
link by Stuart Armstrong 310 days ago | Vadim Kosoy likes this | discuss
The universal prior is malign
link by Paul Christiano 359 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments
(Non-)Interruptibility of Sarsa(λ) and Q-Learning
link by Richard Möhn 373 days ago | Jessica Taylor and Patrick LaVictoire like this | 5 comments
Asymptotic Decision Theory
link by Jack Gallagher 405 days ago | Abram Demski, Jessica Taylor, Patrick LaVictoire, Paul Christiano and Tsvi Benson-Tilsen like this | 2 comments
Variations of the Garrabrant-inductor
link by Sune Kristian Jakobsen 427 days ago | Sam Eisenstat, Abram Demski, Jessica Taylor, Nate Soares and Scott Garrabrant like this | 1 comment
Two Agent Mild Optimization
link by Norman Perlmutter 479 days ago | Abram Demski and Jessica Taylor like this | discuss
A Layman's Explanation of "Safely Interruptible Agents"
link by Zach Weems 481 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss
Improbable Oversight, An Attempt at Informed Oversight
link by William Saunders 489 days ago | Jessica Taylor and Patrick LaVictoire like this | 8 comments
A new proposal for logical counterfactuals
link by Jack Gallagher 505 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 3 comments
An Alternative Setting for Resource-Bounded Lob's Theorem
link by Siddharth Bhaskar 506 days ago | Patrick LaVictoire and Scott Garrabrant like this | discuss
Working on a series of safety environments for OpenAI gym. Would love comments and ideas.
link by Rafael Cosman 530 days ago | Daniel Dewey, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss
every function can be computable
link by Ramana Kumar 565 days ago | Patrick LaVictoire likes this | discuss
Goal completion prior art: feature construction
link by Stuart Armstrong 591 days ago | discuss
An approach to the Agent Simulates Predictor problem
link by Alex Mennen 594 days ago | Vadim Kosoy, Abram Demski, Gary Drescher, Jessica Taylor and Patrick LaVictoire like this | 11 comments
Analysis of Algorithms and Partial Algorithms
link by Andrew MacFie 659 days ago | Patrick LaVictoire and Scott Garrabrant like this | 3 comments
Another toy model of the control problem
link by Paul Christiano 664 days ago | Jessica Taylor likes this | discuss
My current take on logical uncertainty
link by Paul Christiano 665 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss
Some work on connecting UDT and Reinforcement Learning
link by David Krueger 708 days ago | Patrick LaVictoire and Paul Christiano like this | 5 comments
Sequential Extensions of Causal and Evidential Decision Theory
link by Tom Everitt 771 days ago | Kaya Stechly and Patrick LaVictoire like this | discuss
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes

RSS

Privacy & Terms