Math in Elections Part 1 — Every Voting System is Flawed.

Hi everyone!  Are you feeling a little tired of the same-old political discussions?  Wishing there could be more math involved?  If so, I’ve got a blog series for you!

In light of the upcoming elections, I thought it’d be a good time to write on The Math of Elections.  In this post I will talk about the Gibbard-Satterthwaite theorem.  Tomorrow I will write about how this theorem illuminates the problems that exist in our primary system.  Following that I will talk about Kings, suggest alternatives to our primary system, discuss the electoral college, analyze whether voting matters, and lastly, on election day, will analyze the complexity of vote counting.  Let’s get started.

The Gibbard-Satterthwaite theorem deals with elections where there are at least 3 candidates running for a single position.  Observe that most elections have these characteristics — for instance most US elections begin with primaries that feature at least three candidates.  Most elections that take place within your community or among your friends would also qualify.  And even in our largest nation-wide general elections, many races feature an independent or third party candidate who will receive a notable number of votes.

Let’s assume that every member of the electorate explicitly ranks their preferences.  They may prefer the Democrat’s candidate most, then the Libertarian, then Herman Cain, then the Republican, then the Green Partier, etc..  From this, we are going to generate society’s preference list in some way, with the most preferred candidate being declared the winner.  The goal, of course, would be to find a way to do so that is fair.    One procedure could be to ignore how people voted and simply place the candidate’s names in alphabetical order, and declare that to be society’s preference ranking.  But, of course, that is not fair.

What about a system like our own?  Our current primary system, for example, takes each voter’s preference list, ignores everything but their top choice, and gives one vote to the candidate who was that top choice.  Then the candidate with the most votes wins.

Is this system fair?  How can we tell?  And if it isn’t, does a fair system even exist?

Well, unfortunately for society, the Gibbard-Satterthwaite theorem roughly says that no voting system with 3 or more candidates, decided by these voter preference lists, is fair.  First let’s talk about what makes a voting system “fair”?  I believe that most everyone would agree that the following properties are ones that a fair voting system should have.

Dictatorial Property.  For starters, a dictatorship does not seem fair.  So let’s say that the first condition that we demand is that there is no dictator;  i.e. the result of the election cannot simply mimic the preference of a single voter.

In Iran, for instance, it appears that many citizens voted for president, but the only vote that mattered was Ahmadinejad’s vote for himself.

Winnability Property.  Second, it must be possible for any candidate to win.  A system where the winner is decided by picking the person whose name comes first in alphabetical order does not seem fair.  We gotta give Zygarri Zzyzzmjac a fighting chance!

Manipulability Property.  Finally, it should not be to one’s advantage to engage in tactical voting.  Meaning, it should never be possible for someone to submit false preferences because doing so will help their true preference win.

So, for example, suppose that at the moment the Societal Ranking Procedure (whatever it may be) reports that society likes the Republican candidate the most, then the Democrat, then the Green Partier, then Herman Cain, then the Libertarian.  If I personally have the ranking of Republican then Democrat then Herman Cain then Green then Libertarian, but I decide to interchange Democrat and Libertarian or Libertarian and Herman Cain, then that should not change the fact that society prefers the Republican to the Democrat.  It would make no sense if suddenly the Democrats candidate jumped into first place.  More to the point: no one changed their preferences with respect to the Democrats and the Republicans, so it does not make sense that society should have changed its preference either.

So those are three conditions which all seem fairly reasonable ones to ask of a voting system.  However, Gibbard and Satterthwaite showed that these three conditions can not all happen simultaneously.  Given any voting system with at least 3 candidates, where the winner is deterministically chosen as a function of these voter preference lists, either your system has a dictator, or some candidate is incapable of winning, or there is a way to manipulate the system by being dishonest.  How sad.

I will omit the proof of this theorem in order to keep this series accessible to a large audience.  Possibly some day I will come back and write a post proving it.  But that day is not today.

This series will continue tomorrow when I will talk more about this theorem in the context of one unfair electoral system: the US primaries.

Posted in Math in Elections, Math in the "Real World" | 4 Comments

Moving Around a Sphere – Hilbert Spaces Part 3

Well, it’s been a long time since I’ve written a post, but I have a bit of time tonight and I really don’t feel like doing homework tonight, so let’s talk about some mathematics! Last time I wrote, I promised some discussion of the unit ball in infinite dimensional Hilbert spaces. Many of the definitions and even some of the facts I’ll state carry over to arbitrary Banach spaces, but some of the proofs are a bit easier with the inner product structure, so I’ll stick to the nicer Hilbert space case. Here we go:

Last time I remarked that if H is a finite dimensional Hilbert space, then the unit ball of H is compact in the norm topology. The proof of this is a bit messy, so I won’t give it, but I’ll sketch the basic idea, which is straightforward: in \mathbb{R}^n this is obviously true, since the unit ball is closed and bounded. Given any finite dimensional Hilbert space, it is isometrically isomorphic to \mathbb{R}^{\dim{H}}. Under this isomorphism, the unit ball of \mathbb{R}^{\dim{H}} maps onto the unit ball of H, and so the unit ball of H is the continuous image of a compact set, so is itself compact. The messy part is writing down the correct isomorphism, but that’s left to the reader. It’s a fun exercise to use this fact to prove that any two norms on a finite dimensional Hilbert space are equivalent: i.e. if ||\cdot||_1,||\cdot||_2 are two norms on H, there exist non-zero, positive constants A,B so that A||x||_1\leq ||x||_2\leq B||x_1|| for all x\in H, and as an easy corollary to this we see that any two norms generate the same topology on H.

Now let’s suppose H is infinite dimensional. I’ll leave H is an arbitrary infinite dimensional case, but if you want a concrete example, just take H=l^2(\mathbb{N}), the set of square summable sequences of complex (or real) numbers with the natural inner product. Let \{e_\alpha\}_{\alpha\in J} be an orthonormal basis for H. For l^2, the easiest example is to take e_i=(0,0,\dots,1,0,\dots), where the 1 occurs in the i‘th slot. For arbitrary H, let \{e_i\}_1^\infty be a countable subset of the orthonormal basis. Now, for i\neq j, we have

||e_i-e_j||^2=\langle e_i-e_j,e_i-e_j\rangle=||e_i||^2+||e_j||^2=2

so the set \{e_i\}_1^\infty is a \sqrt{2}-separated sequence sitting in the unit ball, and so it has no convergent sub-sequence. For a metric spaces, sequential compactness and compactness are equivalent, so this shows the unit ball is not compact.

The fact that the unit ball of a Hilbert space is not compact is unfortunate. We like compact spaces, they tend to make life easier! One thing we might try to do is define a topology on H where the unit ball is compact. In fact, we can do something similar to this for any Banach space. Actually, we have to work in the dual space for an arbitrary Banach space, but since Hilbert spaces are naturally (anti)isomorphic to their dual via the Riesz representation theorem, we can think of it as a topology on H. For an arbitrary Banach space, this topology will be called the weak* topology (read “weak star”) on X^*. In this post, I’ll define the so called weak topology on X. In the Hilbert space case, these two will coincide, but for an arbitrary Banach space they are different topologies on two distinct spaces. Finally, I’ll prove a cool fact about the weak topology on a Hilbert space. My next post will contain the definition of the weak* topology and Alaoglu’s theorem, which states that the unit ball in X^* is weak* compact.

Definition: Let X be a Banach space. The weak topology on X is defined as follows: a net \{x_\alpha\} converges to x if and only if for all f\in X^*, the net f(x_\alpha) converges to f(x).

Remark: The weak topology depends on the norm on X, it’s not intrinsic to the set X. We use the norm to define the continuous linear functionals that make up X^*, and in turn use these to define the weak topology on X. Also, note that we could have defined the weak topology to be the weak topology generated by X^*, i.e. the weakest (smallest, coarsest) topology on X so that every f\in X^* is continuous. In particular, since the norm topology on X satisfies this condition, it follows immediately that the weak topology on X is in fact weaker than the norm topology: i.e. ||x_n-x||\to 0 implies x_n converges to x in the weak topology (in this case we often we say x_n converges to x weakly).

Now, moving back to the world of Hilbert spaces, what does it mean for a sequence (technically a net, but we’ll just think sequences for the sake of simplicity) to converge weakly in H? Well, recall that by the Riesz Representation theorem, any bounded linear functional f\in H^* is given by f(x)=\langle x,y_f\rangle for some fixed vector y_f\in H (and conversely, the map x\mapsto\langle x,y\rangle defines a bounded linear functional on H). Translating this to weak convergence, we have x_n\to x weakly in H if and only if for all y\in H, we have \langle x_n,y\rangle\to \langle x,y\rangle. Thus for Hilbert spaces, we can think of weak convergence as “convergence of inner products”. One might ask if this is actually a genuinely weaker topology than the norm topology on H. That is, can we find a sequence x_n that converges weakly in H, but does not converge in the norm topology? I’ll end my post by proving the following cool

Proposition: Let \{e_i\}_1^\infty be an orthonormal set in H. Then \{e_i\} converges weakly to 0. Moreover, \{e_n\}_1^\infty does not converge in the norm topology on H.

Proof: The statement about norm convergence was shown above, so we need only prove the statement about weak convergence. To this end, let y\in H. By Bessel’s inequality (or, if you want, extend \{e_i\} to an ONB and apply Parseval’s identity), we have

\sum_{i=1}^\infty |\langle y,e_i\rangle|^2\leq ||y||<\infty

and so in particular the infinite sum converges. But this is just a sum of real number, so we must have that the terms of the sequence go to zero as i\to \infty. But this is precisely the statement that \lim_{i\to\infty} |\langle y,e_i\rangle|^2=0. It is left to the reader to verify that this implies \langle e_i,y\rangle\to 0 as i\to\infty, i.e. e_i\to 0 weakly in H.

Posted in Analysis, Set theory | Leave a comment

AC, part III: Choice in context

Wow, it’s been a long time since my last post. Sorry about that. I intend (though I’ll make no promises) to post a bit more regularly now.

This final post in my ac series was advertised to be a post about surprising uses of choice, but it’s grown to be a general discussion of ac.

— 1. Important principles weaker than full ac

To start developing a picture of how ac relates to other mathematical assertions, we need to describe some so-called choice principles that are not provable in zf but are weaker than ac. (See the previous two posts in this series (I and II) for some assertions equivalent to ac.)

One of the most natural ways to weaken ac is to limit the number of choices you’re making or the size of the bins you’re choosing from. By this sort of weakening we come to one of the mildest choice principles, the Axiom of Countable Choice:

Definition 1. The Axiom of Countable Choice (acω) asserts that every countable family of nonempty sets has a choice function. (That is, if {\emptyset\notin X} and {X} is countable, then there is a function that assigns each member {x} of {X} a member of {x}.)

acω is substantially weaker than ac, but it’s sufficient to forbid many types of unpleasant sets. For example, it is a theorem of zf+acω that every infinite set has a countable subset, though this isn’t provable in zf alone (that is, there is a model of zf in which there is an infinite set with no countable subset). Many elementary facts about the real numbers can be established in zf+acω; for example, only zf+acω is necessary to prove that a countable union of countable sets is countable (though, again, some form of choice is necessary to prove this). In particular, in the presence of acω, {{\mathbb R}} is not a countable union of countable sets. (Apparently the assertion the countable union of countable sets is countable is actually strictly weaker than acω. I think I once found a reference for this.)

We need slightly more firepower to do real analysis properly, though. Proofs in (baby) real analysis often require the construction of a sequence, which is typically built inductively by choosing the {n}th term from, e.g., a ball of radius {1/2^n}. The proof of the Baire Category Theorem is a good example of this style of argument, which uses the Axiom of Dependent Choice:

Definition 2. The Axiom of Dependent Choice (dc) is the following assertion: If {R} is a relation on a nonempty set {X} such that for every {x\in X} there is a {y\in X} such that {xRy}, then there is a sequence {(x_n : n\in{\mathbb N})}, of members of {X} such that {x_nRx_{n+1}} for every {n\in{\mathbb N}}.

Apparently this paper gives a proof that the Baire Category Theorem is equivalent (in zf, of course) to dc.

It is straightforward to show that dc implies acω (exercise!). There are many choice principles (like dc and acω) that guarantee only the existence of choice functions for sets satisfying certain size restrictions. (acω restricts the size of the family; other useful choice principles restrict the size of each member of the family.) dc is sufficient to prove the results of basic analysis and is particularly appealing because it is too weak to prove that nasty things like nonmeasurable sets and Banach–Tarski decompositions of the sphere exist. (See section 2 below.)

Notice that acω and dc are local principles, in the sense that they are chiefly concerned with sets of cardinality near {\aleph_0} (this is less true of dc). For this reason, it shouldn’t be surprising that each of them is strictly weaker than full ac. Applications outside analysis often require global principles (for sets or spaces of arbitrary cardinality, e.g.), and a popular global choice principle weaker than full ac is the Ultrafilter Lemma.

Definition 3. The Ultrafilter Lemma (ul) asserts that every filter on a set can be extended to an ultrafilter.

The Prime Ideal Theorem (pit) asserts that every boolean algebra has a prime ideal, an ideal {I} such that {x} or {-x} is a member of {I} for every {x}.

Conveniently, the ul is equivalent to other interesting choice principles.

Theorem 4. In zf, the following are equivalent:

  1. The Ultrafilter Lemma;
  2. The Prime Ideal Theorem;
  3. The Compactness Theorem for first-order logic.

The proof of this theorem isn’t too difficult, so I won’t provide it here. We are taking on faith that the ul is strictly weaker than ac, but it would be nice to have a sense of just how much weaker it is. We have several different equivalent formulations of ac, each about a different sort of mathematical object:

Theorem 5. In zf, the following are equivalent:

  1. ac;
  2. Every set can be wellordered;
  3. Tychonoff’s Theorem: a product of compact spaces is compact;
  4. Every vector space has a basis.

(By the way, Exercise: prove that Tychonoff’s theorem implies ac. The proof I know of is short but requires a trick.)

By transferring choice principles over to new contexts (like topology) using the ideas from the proof that the assertions above are equivalent, we can provide a satisfying picture of how much weaker the ul is than ac.

Theorem 6. In zf, the following are equivalent:

  1. the ul;
  2. Every product of compact Hausdorff spaces is compact;
  3. Every product of 2 is compact.

[Edit (Nov 10 2012): the ul is not equivalent to the assertion that every set can be linearly ordered or ac with finite `bins’; in fact,

ul \Longrightarrow every set can be linearly ordered \Longrightarrow ac with finite bins,

but none of those implications is reversible. It has recently come to my attention (I believe it’s proved in this paper) that the ul is equivalent to Tychonoff’s Theorem for sober spaces.]

Exercise: (Since so many of our authors and readers are analysts) The pit implies Hahn–Banach. (This is apparently difficult, so comment if you think you have a solution. But don’t give it away; I haven’t thought about it much yet.)

— 2. Math(s) without choice —

I’ll say a few words about two results whose proofs don’t require choice, the Cantor–Schroeder–Bernstein Theorem and Ramsey’s Theorem.

That the csb doesn’t require choice is surprising; its conclusion asserts the existence of a function constructed rather mysteriously from the functions in its hypothesis. Of course, it’s easy to say ‘Obviously it doesn’t require choice: here’s a proof that doesn’t use it!’, but when you suppress the intuition you gain by understanding the proof of the csb, the surprise returns. One reason to expect this surprise is that the easiest proof of the csb in zfc (namely, reduce to the case when your sets are wellordered by appealing to ac) makes unavoidable use of choice. And csb provides a mysterious function, which is what ac usually does.

The dual csb theorem, which asserts that if there are surjections {X\rightarrow Y} and {Y\rightarrow X} then there is a bijection {X\rightarrow Y} (and is a theorem of zfc), is not provable in zf. Apparently it is an open question whether the dual csb implies ac.

Maybe it’s not so surprising that Ramsey’s Theorem doesn’t require ac, but it can be made into a nontrivial choice principle by recasting it as ‘For every {2}-coloring of the {n}-subsets of an infinite set {X}, there is an infinite monochromatic subset {Y} of {X}.’ (‘But can’t you just prove this from the normal statement of Ramsey’s Theorem by restricting to a countable subset of {X}?’, you ask? Remember that it is not provable in zf that every infinite set has a countable subset!)

One final note: Solovay proved that it is consistent with zf+dc that every subset of {{\mathbb R}} is (Lebesgue-)measurable. So we can do analysis somewhat familiarly in a setting where all of the standard theorems (Heine–Borel, etc.) are true, but pathological nasties like nonmeasurable sets don’t exist. What does such analysis look like? I don’t know, but I’d like to.

— 3. Obviously true? —

The temptation at this point, I think, is to point out theorems we are unable to prove without full ac, and then argue that any theory strictly weaker than zfc provides a somehow deficient concept of set, since it is unable to prove those theorems. My objection to this attitude is that there is a lot of choiceless mathematics to be understood (and alternative extensions of zf (e.g., by the Axiom of Determinacy) that are capable of proving quite powerful, interesting theorems) and we simply gain nothing by asserting (or believing or whatever) that some set of axioms is the gods-given ‘correct’ one and that we should blindly stick to it (in the same way, more absurdly, that a commutative algebraist gains nothing by asserting with religious fervor that noncommutative rings simply don’t exist, in whatever metaphysical sense). Anyway, this is a standard platonism-versus-pluralism issue, and philosophers of mathematics spend a lot of time thinking about it.

— 4. The ‘magic wand’ factor —

Another reason to be careful about (though not necessarily avoid altogether) use of ac is that it is notorious for clouding the ideas in proofs. For example, consider the standard proof in zfc that every countable union of countable sets is countable: Take a countable set of sets {A_n}, choose (by acω) an enumeration of each one, and zigzag through the enumerations just as in the proof that {{\mathbb N}\times{\mathbb N}} is countable. Contrast that statement and proof to the following:

Theorem 7. The counted union of counted sets is counted. That is, if {A_1,A_2,\dots} are countably many sets, and for each {n} the function {f_n\colon A_n\rightarrow {\mathbb N}} is a bijection, then there is a bijection {g\colon \bigcup_{n\in{\mathbb N}} A_n\rightarrow {\mathbb N}} (and we can write down a formula for it in terms of the {f_n}).

And, of course, the proof is the same as above, except that the enumerations are already chosen for us. My point is that the standard statement and proof hide the real idea, which is the construction of {g} from the {f_n}. If you want the stronger result, just take this theorem and snort the necessary amount of choice. Separating the choice-snorting from the real work gives us a deeper understanding of the result and its proof. Assuming ac carelessly is in the same spirit—though hardly as ridiculous—as assuming {0=1} would be: you can prove many things, but without understanding your subject as deeply as you would otherwise. (I should mention that I owe this example, related insight, and the expression ‘snorting choice’ to Thomas Forster.)

Recently I encountered another similar example, which I’ll quickly share. I mentioned earlier that the Baire Category Theorem is a theorem of zf+dc. The BCT for separable completely metrizable spaces is actually a theorem of zf. (This isn’t hard to prove, so you should do it if you’re interested. The point is that the countable dense set allows you to make your choices canonical, since you can always take from the countable dense set the lowest-indexed point that does what you need it to do. I think this is a standard trick for proving things about topological spaces without ac, but I don’t know of another example.) I don’t know of a way to get to full BCT from BCT for separable spaces by invoking dc, though, so this isn’t as good an example as the previous one.

— 5. Coming attractions —

I’ve recently gotten excited about infinitary combinatorics, so that will probably be the topic of my next major series. Combinatorial set theory should also show up soon, as will some set-theoretic topology.

I’ll almost certainly write about toposes [sic!] someday, too, but not until after I’ve written a post or two about general category theory.

In other news, Jay has promised to write a post in the next ten days, so stay tuned! P.S. This post is the first I’ve written using Luca Trevisan’s fantastic latex2wp. I recommend it to all math WordPress bloggers! (Quick test: Does WordPress render Erdős correctly? Yes.) Blue, no green!

Two standard references, from which most of this material comes:

Jech, Thomas J. The Axiom of Choice. 2008 Dover edition, originally published as: Studies in Logic and the Foundations of Mathematics, Vol. 75. North-Holland Publishing Co., Amsterdam-London; American Elsevier Publishing Co., Inc., New York, 1973. xi+202 pp. (MR0396271)

Howard, Paul; Rubin, Jean E. Consequences of the axiom of choice. Mathematical Surveys and Monographs, 59. American Mathematical Society, Providence, RI, 1998. viii+432 pp. (MR1637107)

Posted in AC, Logic, Set theory | 6 Comments

Hilbert Spaces – Part 2

Recall that in the finite dimensional world, two vector spaces are isometrically isomorphic if and only if they have the same dimension. In the last post, I mentioned that, by using the appropriate definition of a basis and dimension, one can make the same statement about infinite dimensional (complete) inner product spaces, commonly known as Hilbert spaces. This post will be devoted to proving this fact.

First, let’s recall the definition of an orthonormal basis from last time. Recall that a set of vectors \{e_\alpha\}_{\alpha\in A} is orthonormal if \langle e_\alpha,e_\beta\rangle=\delta_{\alpha,\beta}. The set is called an orthonormal basis if, in addition, it satisfies one of the following equivalent properties:

  1. For all x\in H, we have x=\sum_{\alpha\in A} \langle x,e_\alpha\rangle e_\alpha, where the sum on the right has only countably many non-zero terms and converges in the norm topology no matter how the terms are ordered.
  2. If \langle x,e_\alpha\rangle=0 for all \alpha\in A, then x=0.
  3. Given any x\in H, we have ||x||^2=\sum_{\alpha\in A} |\langle x,e_\alpha\rangle|^2.

In the last post on Hilbert spaces, I proved that every Hilbert space has an orthonormal basis by considering a maximal orthonormal set in H (and so naturally had to invoke Zorn’s lemma) and showed that such a set must be a basis. The following proof is adapted from J.B. Conway’s A Course in Functional Analysis.

Proposition/Definition If E and F are orthonormal bases for the same Hilbert space H, then |E|=|F|. The cardinal number |E| is called the dimension of the Hilbert space H and is denoted by \dim H.
   Proof: The case where either |E| or |F| is trivial since any orthonormal spanning set in a finite dimensional space is a (Hamel) basis and so |E| is the vector space dimension of H. Thus we consider only the case where |E| and |F| are infinite. For fixed e\in E, we define F_e=\{f\in F:\langle e,f\rangle\neq0\}. By definition of an ONB, F_e is countable. Given any f\in F, there must exist some e\in E with \langle e,f\rangle\neq 0, otherwise f=\sum_{e\in E}\langle e,f\rangle e=0. But ||0||=0\neq1=||f|| so such e must exist. Thus F=\bigcup_{e\in E} F_e. But then |F|\leq |E|\cdot \aleph_0=|E| since E is infinite. The proof of the reverse inequality is symmetric, hence |F|=|E|.

Recall that if H and K are two Hilbert spaces, then a linear map T:H\to K is called an isometry if \langle x,y\rangle_H=\langle Tx,Ty\rangle_K for all x,y\in H. Note that an isometry is bounded with norm one, since for any x\in H, we have ||Tx||^2=\langle Tx,Tx\rangle=\langle x,x\rangle=||x||^2. Moreover, if Tx=0, then ||x||=||Tx||=0, so x=0, hence an isometry is always injective and so is an isomorphism onto its range. If T is surjective, then we say T is an isometric isomorphism between H and K and the spaces H and K are isometrically isomorphic and we write H\simeq K. With these definitions in hand, we can prove the following important theorem:

Theorem Two Hilbert spaces H and K are isometrically isomorphic if and only if they have the same dimension.

To prove the theorem, I’ll follow a standard technique of finding a sort of canonical object and showing that any Hilbert space is isomorphic to this object which behaves the way we want it to behave.

Lemma Let A be any set and define

\displaystyle l^2(A):=\{f:A\to\mathbb{C}:\sum_{a\in A}|f(a)|^2<\infty\},\hspace{3mm} \langle f,g\rangle=\sum_{a\in A}f(a)\overline{g(a)}

Then l^2(A) is a Hilbert space under the given inner product and l^2(A)\simeq l^2(B) if and only if |A|=|B|.
   Proof sketch: It is relatively straightforward to show that l^2(A) is a Hilbert space (see, for example, Folland or Conway). For the isomorphism, given a\in A, define f_a by f_a(a)=1 and f_a(b)=0 for b\neq a. It is clear f_a\in l^2(A) and that the set \{f_a\}_{a\in A} is an orthonormal set in l^2(A). To show that it is a basis, suppose that \langle x,f_a\rangle=0 for all a\in A. By definition of f_a, 0=\langle x,f_a\rangle=x(a). Since this holds for all a\in A, x=0, and so \{f_a\}_{a\in A} is indeed an ONB for l^2(A). Define \{g_b\}_{b\in B} similarly. If an isomorphism T:l^2(A)\to l^2(B) exists, then \{Tf_a\}_{a\in A} is an orthonormal basis for B, and so by the well-definedness of dimension, this basis has the same cardinality as \{g_b\}_{b\in B}. But the former set is indexed by A and the latter by B, which implies |A|=|B|. For the converse, use Folland’s proposition 5.30 to get the desired map.

I remark that the space l^2(A) is best understood when |A|=n<\infty. In this case, l^2(A) is simply the set of all n-tuples that are square summable. In other words, l^2(A)=\mathbb{C}^n with the Euclidean inner product. If A=\mathbb{N}, then l^2(\mathbb{N}) is the set of all square-summable sequences and is often denoted by simply l^2 and is the natural extension of \mathbb{C}^n to countably infinitely many entries. Finally, we have:

Proof of Theorem: Suppose H\simeq K and let T:H\to K be an isometric isomorphism. Let E be an orthonormal basis for H and let F=\{Te\}_{e\in E}. It is clear that F is an orthonormal set in K. If k\in K is such that \langle k,Te\rangle=0 for all e\in E, then write k=Th for some h\in H. Then for all e\in E, we have

\langle h,e\rangle=\langle Th,Te\rangle=\langle k,Te\rangle=0

so since E is an ONB for H, it follows that h=0, and so k=Th=0, and so indeed F is an ONB for K. Moreover, clearly |E|=|F| and so \dim K=|F|=|E|=\dim H. Conversely, suppose \dim H=\dim K. In the proof above we showed that \dim l^2(H)=|E|=\dim l^2(K), so indeed from the lemma we have l^2(H)\simeq l^2(K). As isometric isomorphism is a transitive relation (check this), it suffices to show that H\simeq l^2(E) (and similarly that K\simeq l^2(F)). For h\in H, define f_h(e)=\langle h,e\rangle. Then

\langle f_h,f_h\rangle=\sum_{e\in E} \langle h,e\rangle\overline{\langle h,e\rangle}=\sum_{e\in E}|\langle h,e\rangle|^2=||h||^2=\langle h,h\rangle

where I used property three (Parseval’s identity) in the definition of an ONB. Thus the map h\mapsto f_h is an isometry. It’s clearly linear, so it remains to show the correspondence is surjective. I’ll leave it as an exercise to verify that the set of f\in E such that f(e)=0 for all but finitely many e\in E is dense in l^2(E). If f is such a function, let e_1,\dots e_n be the collection of e\in E such that f(e)\neq 0. Then if we let h=\sum_1^n f(e_i)e_i, we have that f=f_h, and so h\mapsto f_h has dense range. Note that if T:H\to K is any isometry of Hilbert spaces and Tx_n\to y, then ||x_n-x_m||=||Tx_n-Tx_m||\to0 so x_n is Cauchy, hence converges to x\in H (as H is complete). As T is continuous, Tx_n\to Tx, so y=Tx, and it follows that T has closed range. It follows from these two facts that the map h\mapsto f_h is surjective, and this completes the proof.

That’s all I want to say about dimension and orthonormal bases for Hilbert spaces. Up until now I’ve focused on showing the similarities between infinite dimensional Hilbert spaces and their familiar finite dimensional counterparts. In my next post in this series, I’ll switch gears and talk about how these spaces are different. It’s not terribly difficult to show that, in finite dimensions, the unit ball, i.e. the set \{x\in H:||x||=1\}, is a compact subset of the Hilbert space H (I say Hilbert space since any finite dimensional Banach space is isometrically isomorphic to \mathbb{R}^n, which can be given the structure of a Hilbert space). I’ll show that in infinite dimensions, the unit ball is NEVER closed in the norm topology on H. After that, I’ll introduce some different topologies one can consider on H, and describe convergence in these topologies, and work towards proving a very cool result known as Alaoglu’s theorem, which states that the unit ball of a Hilbert space is compact in a special topology known as the weak-star topology on H. These results have some natural generalizations to Banach spaces (and even more general objects known as topological vector spaces), and once we prove Alaoglu’s theorem, I’ll use it to show that a very familiar Banach space can not be isomorphic to the dual space of any other Banach space. Until then, best of luck with quals/finals/start of summer classes.

Posted in Analysis, Linear Algebra | 1 Comment

Littlewood’s Three Principles (3/3)

So far we’ve proven Littlewood’s first and third principles. In this last installment, we’ll prove his second principle, that “every function is nearly continuous.” This result is better known as Lusin’s Theorem.

Continue reading

Posted in Analysis | 2 Comments

Littlewood’s Three Principles (2/3)

Littlewood’s three principles provide useful intuition for those first learning measure theory:

“The extent of knowledge [of real analysis] required is nothing like as great as is sometimes supposed. There are three principles, roughly expressible in the following terms:

  1. Every set is nearly a finite sum of intervals.
  2. Every function is nearly continuous.
  3. Every convergent sequence is nearly uniformly convergent.”

– John Littlewood

In my last post, I furnished a proof of the first statement. I should mention that in passing that the principle is true in \mathbb{R}^n as well: every open ball in \mathbb{R}^n is nearly a finite union of cubes (the only change is that you cannot assume that any open set is the disjoint union of open cubes). Today I’ll go out of order and prove the third principle. The reason for this order will be made clear Friday, when I’ll employ the third principle to prove the second one.

The third principle states that “every convergent sequence is nearly uniformly convergent.” Let’s recall the difference between these two modes of convergence. To do so, I’ll employ an analogy.

Let’s fix a sequence \{f_n\}_{n=1}^{\infty} of complex-valued functions with X and let’s play the following game. I have two bags, labeled X and \mathbb{R}^+, with the first bag containing every element of X and the second bag containing every positive real number. You have one bag, labeled \mathbb{\mathbb{N}}, which contains every natural number. At the beginning of the game, you name a complex-valued function f. Also, since I’m no Hercules, we’ll make X finitely measured, i.e. \mu(X)<\infty.

In the first turn, I pull an \epsilon>0 out of the bag labeled \mathbb{R}^+. If you can find me an N large enough in \mathbb{N} so that f_n(x) is within f(x) for all x\in X, then you win the first round.

Then, in the second turn, I pull an \epsilon>0 out of the bag labeled \mathbb{R}^+, but this time I also pull an x out of the bag labeled X. If you can find me an N large enough in \mathbb{N} so that f_n(x) is within f(x) for this specific x, then you win the second round.

If you can find a function f that always guarantee a first-round win, no matter what \epsilon>0 I choose, then we say f_n converges uniformly to f. We call this uniform convergence, since you don’t need to know where I am in space in order to guarantee me that f_n(x) is close to f(x) – i.e. their distance is uniformly bounded. However, if you can never find such an f, then I win.

If you can find a function f that always guarantees a second-round win, no matter what \epsilon>0 and x\in X I choose, then we say f_n converges pointwise to f, since you have to be “wise to the point” I’ve selected in order to guarantee that f_n(x) is close to f(x) – i.e. distance is relative to your position in space. If your second-round win is guaranteed “almost surely” (this is a precise term, actually), that is, the values of x for which I don’t win has zero measure in X, then we say f_n converges pointwise almost everywhere to f.

Now here’s the tl;dr: We say that \{f_n\}_{n=1}^{\infty} converges to a function f

  • pointwise if, for all x\in X and \epsilon>0, there is an N\in \mathbb{N} (dependent on both x and \epsilon) so that \left|{f_n(x)-f(x)}\right|<\epsilon provided n\ge N.
  • uniformly if, for all x\in X and \epsilon>0, there is an N\in \mathbb{N} (dependent only on \epsilon) so that \left|{f_n(x)-f(x)}\right|<\epsilon provided n\ge N.

Example. Suppose I give you the collection of functions f_n: [0,1]\to \mathbb{R} given by f_n(x)=x^n. If you graph these functions you’ll see that the higher exponent smashes the function to zero, except at 1, where it is constant, i.e. f_n(x) converges pointwise to the function

f(x)=\left\{\begin{array}{ll}0\text{ if }x\in [0,1)\\ 1\text{ if }x=1\end{array} \right.

To see why, suppose I give you x\in (0,1) and \epsilon>0. If \epsilon>1, pick whatever you’d like in response. Otherwise, then you just need to solve the equation x^n<\epsilon. So N must be at least \frac{\ln(\epsilon)}{\ln(x)} (which is positive since \epsilon,x<1). The cases x=0,1 are trivial, and so you win in round 2. But you can never win in round one! Why? For each 1>\epsilon>0, we see that x^n\ge \epsilon for all x\in [\epsilon^\frac{1}{n},1). So no matter what N you choose, unless you know the position of x, you can’t get f_n(x) uniformly within \epsilon of f.

Now, suppose you get sick and tired of losing in the first round. So you demand a rule change: at the beginning of each game, you can get rid of as many x‘s as you want out of my bag labeled X. But, still wanting to make the game attractive to me, you stipulate that I can limit the proportion of x‘s you take out of my X bag, as small as I’d like (provided it’s positive). After all, the problematic set in the example above gets arbitrarily small as n grows, so hopefully you can pull off a win if you can cut that piece out. Egorov’s Theorem shows that, given these modified rules, you can always guarantee a win – in the first turn.

Theorem (Egorov’s Theorem) If \mu(X)<\infty, and \{f_n\}_{n=1}^{\infty} is a sequence of complex valued functions on X that converge pointwise to f almost everywhere. Then for every \epsilon>0 there exists E\subseteq X such that \mu(E)<\epsilon and f_n\to f uniformly on E^c.

Proof. Assume without loss of generality that f_n\to f everywhere on X (otherwise we can redefine each f_n appropriately). For each k,n\in \mathbb{N}, let

E_n(k)=\bigcup_{m=n}^{\infty}\{x: \left|{f_m(X)-f(x)}\right|\ge k^{-1}\}.

So E_n(k) is the set of all values x\in X for which f_m(x) is at least \frac{1}{k} away from f(x). Fix a k\in \mathbb{N}. The intersection \cap_{n=1}^{\infty}E_n(k) is the set consisting of all x for which \left|{f_n(x)-f(x)}\right|\ge \frac{1}{k} for all n\in \mathbb{N}. But f_n\to f everywhere, and so \left|{f_n(x)-f(x)}\right|\to 0 for all x\in X. So \cap_{n=1}^{\infty}E_n(k)=\emptyset. Moreover, it is easy to see that E_{n+1}(k)\subseteq E_{n}(k) for all n\in \mathbb{N}.

Since \mu(X)<\infty, the continuity of \mu from above shows that \mu(E_n(k))\to 0 as n\to \infty. Let \epsilon>0. For each k\in \mathbb{N}, choose n_k so large that \mu(E_{n_k}(k))<\epsilon 2^{-k} and let E=\bigcup_{k=1}^{\infty}E_{n_{k}}(k). Then

\mu(E)\le \sum_{k=1}^{\infty}\mu(E_{n_{k}}(k))<\epsilon\sum_{k=1}^{\infty}2^{-k}=\epsilon.

Now, for all x\not\in E, we have that \left|{f_n(x)-f(x)}\right|<k^{-1} for all n>n_k. So f_n\to f uniformly on E^c, as desired.

I should mention that one may relax the condition that \mu(X) and instead stipulate that \left|{f_n}\right|\le g for all n and some g\in L^1(\mu), which follows from the triangle inequality and a simple inclusion argument.

Next time (Friday), I’ll prove the second principle, known better as Lusin’s Theorem. Thanks for reading.

Posted in Analysis | 2 Comments

Littlewood’s Three Principles (1/3)

When I first studied measure theory, I felt like I could never quite wrap my head around what a measurable set or function looked like. After all, the Borel Heirarchy is huge, and what the hell does a F_{\sigma\delta\sigma\sigma\delta\sigma} set look like anyway? In his 1944 text “Lectures on the Theory of Functions”, John Littlewood, a professor of mathematics at Cambridge, sought to alleviate similar confusions.

Continue reading

Posted in Analysis | 4 Comments