(This post resulted from a conversation with Wei Dai.)
Formalizing the tiling agents problem is very delicate. In this post I’ll show a toy problem and a solution to it, which arguably meets all the desiderata stated before, but only by cheating in a new and unusual way.
Here’s a summary of the toy problem: we ask an agent to solve a difficult math question and also design a successor agent. Then the successor must solve another math question and design its own successor, and so on. The questions get harder each time, so they can’t all be solved in advance, and each of them requires believing in Peano arithmetic (PA). This goes on for a fixed number of rounds, and the final reward is the number of correct answers.
Moreover, we will demand that the agent must handle both subtasks (solving the math question and designing the successor) using the same logic. Finally, we will demand that the agent be able to reproduce itself on each round, not just design a custommade successor that solves the math question with PA and reproduces itself by quining.
To nail down the formalization, we will use good old programs with proof limits, not oracles or modal formulas. The universe program U will contain the agent program A as a subprogram, and A will know the full source code of U (including A) by diagonalization. At each round, the agent will receive the text of a math question as input, and return a pair consisting of the answer to the math question and the source code of the successor agent.
The math questions will be of the form “you’re allowed to use N_i computation steps, here’s the source code of a program P_i, what does it return?” The programs P_i will be chosen so that N_i steps isn’t enough to run them to completion, but is more than enough to find a PA proof that P_i returns some particular value.
To give the agent some assurance of the above, the universe program will use the following transparent procedure to evaluate the agent’s answer:
Run P_i to completion, and compare the result with the agent’s answer.
Verify that there exist PA proofs about P_i that can be found in much fewer than N_i computation steps. Ignore the answers found in the proofs.
Increment the agent’s utility if (1) succeeded or (2) failed.
We will also assume that all math questions are so difficult, and all N_i are so large, that doing any fixed work up front is a rounding error by comparison.
The above sounds like a fair formulation of the tiling agents problem, right? The Vingean principle is satisfied because the initial agent doesn’t have enough time to solve all math questions up front. The tiling principle is satisfied because we demand a solution that will reproduce itself at each round. The naturalistic principle is satisfied because we demand a solution that will treat both subtasks uniformly (solve the math question and design a successor).
Now I will show a solution that seems to meet all desiderata. It will be based on the theory PA+Con(PA), which is the weakest we can do, because the problem requires creating successor agents that trust PA. But you will still say that I’m cheating.
Let the agent A be this program:
“Ignore my input at first. Using a small number of computation steps, and using PA+Con(PA) as the theory, search for programs B such that, if all instances of A internally delegate all inputoutput to B, then U will provably return some particular utility. Then internally delegate to the B that leads to the highest utility found.”
Then A will find a short proof that delegating to the following program B leads to high utility (winning all rounds):
“Accept P_i and N_i as input. Using slightly fewer than N_i computation steps, and using PA as the theory, find the first proof of P_i returning anything. Then return the pair of that value and A’s source code.”
(To an outside observer, who doesn’t know about A’s private rituals, that looks like A correctly solving the math question and designating itself as successor. Also note that A’s algorithm has no mention of this particular B, nor of the two subtasks that B must solve.)
The proof is simple. If A internally delegates to B on the first round, then it will delegate to the same B on all subsequent rounds, because it ignores the input while choosing B. That fact is easily provable in PA. Then B goes on to handle the inputs on different rounds and wins all of them, which is easily provable in PA+Con(PA) due to the way the universe program verifies the answers. At each round B designates A as successor, which rubberstamps B again, and the cycle continues.
You could say that A fails the Vingean principle, but that wouldn’t be fair. After all, A indeed can’t solve all questions in advance, because it doesn’t have enough time. It only solves the question “which B to run internally?” in advance, giving the same answer redundantly on each round, but that’s just an implementation detail. If you didn’t know how A worked internally, you wouldn’t object to it.
Of course it’s not guaranteed that A will reproduce itself. It knows that reproducing itself will lead to high utility, and we happen to know that no higher utility will be found, but the same utility could be achieved in other ways (e.g. by choosing a B that reproduces itself directly, instead of going through A). But that’s not a strong objection, because given the right tiebreaking, A will indeed reproduce itself.
Moreover, the approach of A looks quite general, without anything specific to this problem. It can also deal with all the usual UDT test cases, like Newcomb’s Problem, as well as all imaginable tiling problems that are amenable to quining. So someone optimistic could say that A is a version of UDT that solves the tiling agents problem!
That would be unsatisfactory though. It looks more like A has found a trick that leads to quining in a roundabout way. I feel that the people working on tiling agents wouldn’t accept my solution, but I can’t quite figure out which new principle would rule it out. Perhaps some kind of “no metaquining principle”? Do humans obey that principle, or do they have intuitions like “solve decision theory in the abstract and delegate to that” which sound suspiciously like my agent? It’s tricky and my thinking is very confused right now.
What do you think?
