Comment Author | Post | Deleted By User | Deleted Date | Deleted Public | Reason |
---|---|---|---|---|---|
How We Picture Bayesian Agents | LawrenceC | false | Whoops, Gwern already mentioned this work, my bad. | ||
LLMs for Alignment Research: a safety priority? | lukehmiles | false | |||
Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small | leogao | false | |||
Discussion: Challenges with Unsupervised LLM Knowledge Discovery | Clément Dumas | true | Sorry I didn't understand you were confused because of the visualization | ||
Evaluating the historical value misspecification argument | Daniel Kokotajlo | true | Accidental duplicate | ||
Evaluating the historical value misspecification argument | Daniel Kokotajlo | true | Accidental duplicate | ||
TurnTrout's shortform feed | Ben Pace | false | |||
TurnTrout's shortform feed | Ben Pace | false | |||
Coup probes: Catching catastrophes with probes trained off-policy | Fabien Roger | false | |||
Preventing Language Models from hiding their reasoning | Fabien Roger | true |
Author | Post | Banned Users |
---|---|---|
Asymptotically Unambitious AGI |
ID | Banned From Frontpage | Banned from Personal Posts |
---|---|---|
User | Ended at | Type |
---|---|---|
allComments | ||
allComments | ||
allComments | ||
allComments | ||
allComments | ||
allComments | ||
allComments | ||
allPosts | ||
allComments |