This post explains how to view Kakutani’s fixed point theorem as a special case of Brouwer’s fixed point theorem with hyperreal numbers. This post is just math intuitions, but I found them useful in thinking about Kakutani’s fixed point theorem and many things in agent foundations. This came out of conversations with Sam Eisenstat.
Brouwer’s fixed theorem says that a continuous function from a compact convex subset of \(\mathbb{R}^n\) to itself has fixed point. Kakutani’s fixed point is similar, but instead of continuous functions, it uses Kakutani functions, which are set valued functions with closed graph which are point wise nonempty and convex.
When I think about Kakutani functions, I usually think about them as limits of continuous functions. For example, consider the kakutani function \(f\) from \([1,1]\) to itself which sends negative inputs to \(1\), positive inputs to \(1\), and sends \(0\) to the entire interval. You can view \(f\) as a the limit of a sequence of functions \(f_n\) sends \(x\) to \(\min(\max(n\cdot x,1),1)\). This is not a point wise limit, since if it was \(0\) would be sent to 0, rather than the entire interval. Instead, it is a limit in the Hausdorff metric between the graphs of the functions.
Given two compact nonempty subsets \(X\) and \(Y\) of \(\mathbb{R}^n\), the Hausdorff distance between \(X\) and \(Y\) is the maximum over all points in \(X\) or \(Y\) of the Euclidean distance between that point and the closest point in the other set. Since \(X\) and \(Y\) are compact, this maximum is achieved.
Given compact convex subsets \(X\) and \(Y\) of \(\mathbb{R}^n\), we say that a sequence of continuous functions \(f_n\) from \(X\) to \(Y\) converges in graph Hausdorff distance to the closed graph set valued function \(f\) if the graphs of \(f_n\) viewed as subsets of \(X\times Y\) converges to the graph of \(f\) in Hausdorff distance.
We say that a closed graph function \(f\) from \(X\) to \(Y\) is a continuous graph limit of if there exists a sequence of continuous functions which converges to \(f\) in graph Hausdorff distance.
Theorem 1: Every continuous graph limit \(f\) from a compact convex subset of \(\mathbb{R}^n\) to itself has a fixed point (a point contained in its image).
Proof: \(f\) is a limit of continuous functions \(f_n\) each of which has a fixed point by Brouwer. Choose one fixed point from each \(f_n\) to get a sequence of points which has a convergent subsequence by Bolzano–Weierstrass. Let \(x\) be the limit of this convergent subsequence. If \(f(x)\) did not contain \(x\), then \((x,x)\) would not be in the graph of \(f\). Since \(f\) is closed graph, a ball around \((x,x)\) would not be in the graph of \(f\), which contradicts the fact that \(\{f_n\}\) must contain functions with graphs arbitrarily close to the graph of \(f\) with fixed points arbitrarily close to \(x\) and thus points in their graphs arbitrarily close to \((x,x)\). \(\square\)
This theorem is not equivalent to Kakutani’s fixed point theorem. There exist continuous graph limits which are not point wise convex (but only in more than one dimension). For example the function from \([1,1]^2\) to \([1,1]^2\) which sends every point to the circle of points distance 1 from \(0\) is not Kakutani, but is a continuous graph limit. It is the limit of functions \(f_n\) given by \((x,y)\mapsto (\cos(n\cdot x),\sin(n\cdot x))\).
However, this theorem is strictly stronger that Kakutani’s fixed point theorem (although sometimes harder to use, since showing a function is Kakutani might be easier than showing it is a continuous graph limit)
Theorem 2: Given compact convex subsets \(X\) and \(Y\) of \(\mathbb{R}^n\), every Kakutani function \(f\) from \(X\) to \(Y\) is a continuous graph limit.
Proof: We define a function \(f_n\) as follows. Take a finite set \(S\) of radius \(1/n\) open balls in \(X\times Y\) such that each ball intersects the graph of \(f\), the \(X\) coordinates of the centers of all the balls are distinct, and the balls cover the entire graph of \(f\). This induces a covering \(S_X\) of \(X\) by radius \(1/n\) open balls by taking balls centered at the \(X\) coordinates of the centers of \(S\). We continuously map each point in \(X\) to a weighted average of balls in \(S_X\) as follows. If a point is the center of some ball, it is sent to 100% that ball. Otherwise, it is sent to a combination of all the balls in which it is contained with weight proportional to (the reciprocal of the distance to the center of that ball) minus \(n\). This gives a function \(f_n\) from \(X\) to \(Y\) by mapping each point to the weighted average of the \(Y\) coordinates of the centers of the balls in \(S\) with weights equal to the weights of the corresponding balls in \(S_X\) above. One can verify that \(f_n\) is continuous.
Observe that the graph of \(f_n\) contains the centers of all balls in \(S\). Thus, \(f_n\) contains points within \(1/n\) of every point is the graph of \(f\). Thus, if \(f_n\) did not converge to \(f\), it must be because infinitely many \(f_n\) contain points a distance from the graph of \(f\) bounded away from \(0\). Consider a convergent subsequence of these points. This gives a point \((x,y)\) not in the graph of \(f\) and a subsequence of the \(f_n\) with points in their graphs converging to \((x,y)\).
Let \(d\) be half the distance between \(y\) and \(f(x)\), and consider the set \(T\) of all points in \(Y\) at most \(d\) from the nearest point in \(f(x)\). Note that \(T\) is convex. Note that all \((x^\prime,y^\prime)\) in the graph of \(f\) with \(x^\prime\) sufficiently close to \(x\) must have \(y^\prime\) in \(T\), since otherwise there would be \((x^\prime,y^\prime)\) with \(x^\prime\) converging to \(x\) which must have a convergent subsequence with \(y^\prime\) converging to a point not in \(f(x)\), contradicting the fact that \(f\) has closed graph.
However \(f_n\) must send all points within distance \(\varepsilon\) of \(x\) to a point in the convex hull of the images under \(f\) of points within \(\varepsilon+1/n\) of \(x\). But, we showed that for \(\varepsilon\) sufficiently small and \(1/n\) sufficiently large all of these points must be in \(T\). Therefore, for all sufficiently large \(n\), \(f_n\) must send all points within \(\varepsilon\) of \(x\) to points in \(T\), which are bounded away from \(y\), contradicting the assumption that points in the \(f_n\) converge to \((x,y)\). \(\square\)
Corollary: Kakutani’s fixed point theorem
We have proven (a strengthening of) Kakutani’s Fixed Point Theorem from Brouwer’s fixed point theorem, and given a way to think about Kakutani functions as just limits of graphs of continuous functions, and thus have better intuitions about what (a superset of) Kakutani functions look like. We will now take this further and think about Kakutani as a consequence of an analogue of Brouwer using Hyperreal infinitesimal numbers. (I am going to be informal now. I am not going to use standard notation here. I am not going to make sense to people who don’t already know something about nonstandard analysis. Sorry.)
Given a compact convex subset \(X\) of \(\mathbb{R}^n\), we can define \(*X\) to be the set of all equivalence classes of infinite sequences of elements of \(X\), where two sequences are equivalent if they agree on a set that matters according to some ultrafilter \(U\) on \(\mathbb{N}\). A function \(*f\) from \(*X\) to itself is defined by a sequence of functions from \(X\) to itself \(\{f_n\}\), where you apply the functions pointwise. I will call a function hypercontinuous if each of the component functions is continuous. (I am not sure what this is actually called.) Each point in \(*X\) has a standard part, which is a point in \(X\), which is the unique point such that a subset of components that matter converges to \(X\).
Claim: Every hyper continuous function \(*f\) from \(*X\) to itself has a fixed point.
Proof: Just take the sequence of the fixed points of the individual component functions.\(\square\)
Claim: Continuous graph limits \(f\) from \(X\) to \(Y\) are exactly those closed graph setvalued functions such that there exists a hypercontinuous function \(*f:*X\rightarrow *Y\) such that \(y\in f(x)\) if and only if there exist points \(*x\in*X\) and \(*y\in*Y\) with standard parts \(x\) and \(y\) respectively such that \(f(*x)=*y\)
Proof: Use the same sequence of functions with graphs converging to that of \(f\) as the sequence of functions defining \(*f\).\(\square\)
Thus, we can view continuous graph limits (and thus Kakutani functions) as something you get when looking at just the standard part of a hypercontinuous function from the hyper version of \(X\) to itself. The fixed point will fix everything, including the infinitesimal parts, and we do not have to deal with any setvalued functions.
For example, consider our original function \(f\) from \([1,1]\) to itself which sends negative inputs to \(1\), positive inputs to \(1\), and sends \(0\) to the entire interval. We can view this as a function \(*f\) involving infinitesimals where everything with positive real part is sent to something with real part \(1\), everything with negative real part is sent to something with real part \(1\), and the infinitesimal numbers very close to 0 are sent to something inbetween. If we use the sequence of functions from above and let the infinitesimal \(\varepsilon\) be \(\{1/n\}\), then zooming in on the inputs between \(\varepsilon\) and \(\varepsilon\), \(*f\) will just be a steep linear function with slope \(1/\varepsilon\).
Now to be even more vague and connect things back up with agent foundations, perhaps this can give some good intuitions about what is happening with reflective oracles and probabilistic truth predicates. The oracle/truth predicate is effectively “zooming in” on the area around a specific probability, and when you stack oracle calls or truth predicates within each other, you can zoom in further. The fact that the probabilistic truth predicate does not know that it is reflectively consistent, can be viewed as it not believing a sentence akin to “If I assign probability less that \(\varepsilon\) to \(\phi\), then I also assign probability less that \(\varepsilon^2\) to \(\phi\),” which seems very reasonable. It also makes reflective oracles and the probabilistic truth predicates look more similar to other approaches to the same problem that are more hierarchy forming solutions to the same problem like normal halting oracles. Here the hierarchy comes from zooming in further and further on the infinitesimal in the Kakutani function.
