If you are like me, you’ve probably got lots to do for final exams and things of the like as the semester (trimester, quarter, whatever your school uses) comes to a close. In an effort to take a bit of break, I would like to present you with a cute little result I came across in my studies. It’s a nice little number theory fact with a cute combinatorial proof. I found the result in Hatcher’s book on algebraic topology, and I’ll present an adapted version of his proof (though I’m sure it’s in many other books).

First we need to remember a little about p-adic representations of integers. If n is any integer (technically we should assume n\geq0, it doesn’t really matter), and if p is a prime number, then n can be written (uniquely) as n=\sum_{l} n_lp^l, where 0\leq n_l<p are integers. This is easy to prove using the fundamental theorem of arithmetic and the division algorithm. Without further adieu I give you the following

Proposition: Let p be a prime number and n,k\in\mathbb{N}. Then

\binom{n}{k}=\prod_l\binom{n_l}{k_l}\mod p

where n=\sum_l n_lp^l and k=\sum_l k_lp^l are the p-adic representations of n and k.
Proof: Recall that in the polynomial ring \mathbb{Z_p}[x], we have the identity (1+x)^{p^j}=1+x^{p^j}. Working modulo p, we have

(1+x)^n=(1+x)^{n_0}(1+x^p)^{n_1}(1+x^{p^2})^{n_2}\dots

=\left(1+\binom{n_0}{1}x+\binom{n_0}{2}x^2+\dots+\binom{n_0}{p-1}x^{p-1}\right)\times

\left(1+\binom{n_1}{1}x^p+\binom{n_1}{2}x^{2p}+\dots+\binom{n_1}{p-1}x^{(p-1)p}\right)\times\dots

On one hand, we know that the coefficient of x^k in this expansion is \binom{n}{k}. On the other hand, if we expand the product on the right, it’s easy to see that no two terms in the product will have the same power of x^j. Thus, to compute the coefficient of x^k in the expansion, we need to choose x^{l_i} from each factor so that \sum_i l_i=k. But we also know that in the i‘th factor, we have terms of the for j_ip^i. In other words, we simply pick l_i so that \sum_l j_lp^l=k, which is exactly the p-adic representation of k. The coefficient on each of these terms if precisely \binom{n_i}{k_i}, and so taking the product gives the result.

And that’s it! I hope you enjoyed it, and now I’ve got to get back to work!

Posted on by Ryan | 1 Comment

Math in Elections Part 7 — The Complexity of Declaring a Victor.

Alright, folks.  Today’s the day.  Today the country votes on who will represent us in government.  But more importantly for the news organizations, today is the day when they get to hype the heck out of every exit poll, every demographically significant neighborhood, every weather report, and, of course, every new set of raw vote data.

And if the polls thus far are any indication, it will be late into the evening or early into tomorrow morning until we learn about the results for the day’s biggest race: Ohio (AKA: the presidency).

So how are votes counted and is that indeed the most efficient way possible?  What is the complexity of determining the winner of each state?

In a presidential election, each state will have many people on the ballot.  Of course Obama and Romney will be there, but Libertarian Gary Johnson will be on the ballot in every state and in DC except possibly Michigan and Oklahoma, where his ballot access has been challenged.  The Green Party’s Jill Stein will be on the ballot in 38 states and in DC.  The Constitution Party and Justice Party will also be on in a handful of states.  And in each state, many more may be on just for that specific state.  But more importantly, most states have a write-in option where you can write in a vote for anyone or anything you like; such as yourself or Daffy Duck or Whateversuitsyourboat or Rex Burkhead or Herman Cain.

Nebraska, for instance, has a write-in option.  And any name is acceptable for submission.  Meaning there is no (finite) bound on the number of possible write-in submissions.

Moreover, let’s assume for the sake of this problem that, although there may be a large collection of write-in submissions for who-knows-whom, that in the end, one of the candidates will get at least 50% of the vote.  (This is not a strong assumption for many elections since oftentimes, unless a candidate gets 50% of the vote, there is a runoff election between the two leading vote-getters anyways.  And hence there would be another election where getting 50% is actually required.)  But for the sake of the algorithm, we cannot and should not narrow down to a finite list who we think will win.  We may not immediately say that Obama or Romney will win, because that would put an immediate bias against Whateversuitsyourboat, which is just not right.

Lastly, for this problem we may even assume more broadly that the tally is “on-line”; meaning the votes are coming in one at a time, and we are viewing them only as they come in.  So we don’t receive them as a pre-arranged set or know how many will eventually be cast.

Ok, so that’s the setup.  Now here’s the question:  How many operations must be done in order to determine which candidate got 50% of the vote?  Can we find a strategy that is fast and is not dependent on the number of candidates submitted?

Say the candidates are enumerated 1 through k, but we don’t know how large k is.  One option is to create an array.  Each time a candidate’s name comes in, we search the array to determine if (s)he is already listed.  If so, we increase his/her tally by one.  If not, then we create a new spot for that candidate and set his/her initial tally equal to 1.  And then, at the end of all of this, once every vote has come in, we search through this list and find the largest value.

However, this process is both not nearly as fast as possible and the last step is highly dependent on k.  If k is large, then determining the winner at the end could take many operations.

I claim there is an algorithm that will deterministically take the, say, n votes off the line and spit out the winner in at most n operations.

Note that we did not use in our last algorithm that one candidate will get 50% of the vote, so we definitely want to incorporate that in somewhere.  Also note that in the end, we do not care to know the exact vote tallies of all the losers, or even the winner.  So keeping track of them seems like we were doing more work than necessary.  This is a fun problem to think about.  So I encourage you to spend some time mulling it over on your way to the polling booth today, before you come back and read the solution.  But if you wish to know the answer right off the bat, well that’s ok too.  Here it is.

As your votes come in, put them in a list.  Proceed through the list keeping track of two things: one candidate’s name and one number.  Say the sequence started like this:

A \ \ A \ \ A \ \ C \ \ B \ \ B \ \ B \ \ C \ \ C \ \ C \ \ B \ \ C \ \ C

So after thirteen votes, 3 candidates have thus far been recorded.  After the first vote, the pair is (A : 1). This is because the first vote was for A, but more importantly it means that since the beginning (when no one had a majority), A has had the majority of the votes, and he is in fact 1 vote above the majority (if he lost 1 more vote he would lose the majority).  After step two the pair is (A : 2), and after the third step it is (A : 3).  So, again, since the starting position when no one was winning, A has had the majority and it would take three non-$A$ votes for him to lose it.

The first non-$A$ vote happens next.  After the C the count is at (A : 2), then it goes to (A : 1) after the first B.  Note: another way to look at the count at this point is that the number of As at this point is one larger than the sum of the numbers of everything else.  Proceeding from here, after the second B it is (? : 0), meaning at this point no candidate has a majority of the votes.  After the next step, though, we are going to dictate that the the count be (B : 1).  Note that this pair does not mean that B is one vote above the majority in the entire string thus far.  It only means that ever since the (? : 0) point when no candidate had a majority, B is winning.  Continuing in this way, the counts starting from here are

(B : 1) \to (? : 0) \to (C : 1) \to (C : 2) \to (C : 1) \to (C : 2) \to (C : 3).

Our conclusion from this is that C must have won.  But why?  We noted that this (C : 3) only means that since the last no-winner point, C has been up by 3 against the majority.  It does not mean, however, that C is 3 above the majority among all votes cast so far.  How do we know that we didn’t run up the tally on A and B early on, causing one of those to be ahead of C?

Here’s one way to see it.  Note that certainly A is not the winner, since when he lost the majority, he never gained it back.  B is not the winner for the same reason.  But wait!  We supposed that one of the candidates was going to get at least 50% of the vote.  Therefore, with that supposition and us ruling out A and B as winners, the winner must be C.

And how many steps did our algorithm take?  Only one step for each vote.  So with n votes, we require a mere n steps to determine the winner.

Well, that’s it for this series.  I intend to post more often, but you know how that goes.  I stopped mid-series last spring.  Maybe I will pick that up again.  Or maybe I won’t.  Who knows.  Luckily Hotovy is back with some good analysis posts, Zach claims that he’ll post more this year, and Adam is planning on writing on the Noble Prize in Economics soon.  Lots to look forward to!  Of course, even with our joined forces we still can’t even begin to compete with the posting rate over at Andy Soffer’s wonderful blog. But hey, we’re only human.

Until next time!  Now go watch CNN, all you political junkies!

Posted in Math in Elections, Math in the "Real World" | Leave a comment

Math in Elections Part 5 — There are Alternatives!

“There have been 45 presidential elections since 1828. In at least five, the race went to the second-most-popular candidate because of a spoiler. That’s an 11 percent rate of catastrophic failure. Were the plurality vote a car or an airliner, it would be recognized for what it is — a defective consumer product, unsafe at any speed.”

— William Poundstone, Gaming The Votes

There is an argument to be made that Mr. Poundstone is right that the US voting system has failed profoundly at the biggest level a very significant number of times.  You can also make the argument that if you increase your sample size to include elections from all around the world with a similar system to our own, that the system fails with even greater likelihood.  But I won’t dwell on arguing this point.  We have talked about the issues with our primary and general election systems.  Now let’s talk about some alternatives.

In this post I want to talk about some alternative voting systems when there are several influential candidates in the field.  This includes the presidential primaries, PTA elections, governing board elections at your church, and all other emotional affairs where a voter could be drawn to tactical voting if necessary.

— State-Based Alternatives — 

Let’s start with how we elect our president.  First, if we still want to follow the states-focused path of America’s (non-mathematician) founding fathers, then one state-based alternative is a weighted popular vote, where the weight of a vote is determined by which state it comes from.  One way to implement this is to let each state still have its fixed number of contributing electoral votes, but the votes are awarded proportionally to the candidates.  That still allows each American voter to be heard, yet would cap the influence of a state, since each state would have the same number of contributing electoral votes regardless of how many of its voters turned out on election day.  I don’t think this is a great replacement, but I do think it is an improvement.

I will note that two states, the first being The Great State Of Nebraska, and the second being Maine, have a system similar to this.  They award their electoral votes by congressional district, not as a whole state.  This caused Obama to win one of the five electoral votes from the very red state of Nebraska (second most consistently red state in the union over the past half century).  This was the furthest Obama reached into red territory in 2008.  The problem with this process, though, is seen in Nebraska GOP’s response to Obama winning this electoral vote.  They responded by redrawing district lines.  They shifted democratic neighborhoods from the second district, which Obama won, into the more republican first district, which was able to safely absorb these voters without risk of turning blue.

This process of strategically bunching together voters of opposing party identification to yourself in order to contain their influence is called Gerrymandering.  It leads to ridiculously shaped districts like North Carolina’s 12th congressional district, seen below.

State lines, however, will not be changed.  Therefore to avoid these blatantly non-democratic procedures, I believe that it is best to divide up the electoral votes dependent upon state vote tallies, not the tallies from these fluid congressional districts.

Govtrack.us has a nice feature where you can scroll through time to see how congressional districts have changed.  It’s pretty ridiculous.

A final method that maintains the state-focused viewpoint would be to have a dual system where half the vote was determined by some state-based system (like the electoral college, but ideally a better one), and the other half by a popular vote.  Our Congress is divided up into two houses, one with each representative speaking for roughly the same number or citizens, and the other with a pair of senators representing a single state.  So it would be consistent with the founder’s vision in that regard to have such a balance between the people and the states when electing our president.

 

Now I want to talk about two voting systems that can be used in any election setting.  They’d work well for the primaries, for instance.  These are Instant Runoff Voting (IRV), and Range Voting (RV).  I prefer the latter to the former.  And so I will save that for my grand finale.  Let’s start with IRV.

— Instant Runoff Voting — 

The idea behind IRV is straightforward.  Each voter gets a list of all the candidates and ranks any of them that (s)he wishes to.  Like so.

We all agree that if a candidate is the top choice for over half of all citizens, then that candidate deserves to be elected.  Therefore, if a candidate is the number 1 choice for over half of the voters, then IRV dictates that that candidate is the winner.  However, if there is no such candidate, then we might agree that there should be some sort of runoff election.  IRV is then designed to immediately simulate a runoff election by people’s other preferences.

After the first round, we create a ranking of all of the candidates, as determined by the number of first place votes they received.  Since we have all this voter data provided to us, let’s talk about what is the best possible way to simulate a runoff election.  It might not make sense to eliminate everyone except the top two (like many runoff elections), since possibly all of the top three got very similar results, and maybe even the third-place candidate is ranked as people’s number 2 choice far more often than the first or second finishing candidates.

So let’s be safe and only eliminate from the race the candidate that finished in last place.  Then, let’s redistribute his/her votes according to the next preference on each ballot that placed him/her in first place.  Then we check again whether we have a candidate who has surpassed 50% of the vote.  If not, we repeat.  The flow chart looks like this:

We continue this process until we have a winner.  The algorithm (with some tiebreak rules) will always give us a single winner.

This system has many advantages.  It’s criteria for declaring a winner is a pretty good one.  And having many runoff elections (but without paying to keep running them, losing voter turnout, etc.), and only eliminating the worst candidate each time, seems like a good idea.  One possible point for discussion is whether having the least number of first place votes does qualify you to be “the worst”.  There could very well be many other more sensible metrics for determining which candidate is the worst representation of society.

There are, however, some unfortunate consequences of IRV.  The way votes are redistributed allows for some pathological behavior.  Here are some examples.

  • It is possible for a candidate to be capable of beating every other candidate in a head-to-head election, but yet fail to win IRV.
  • If you were to divide the electorate into two groups, and run the election on both groups, it is possible for both halves to elect the same candidate but yet the election being run on the entire electorate would select a different candidate.  (All preferential voting systems are susceptible to this pathology.)
  • It is possible that placing a candidate higher on your preference list will actually hurt his/her chances of winning.  One way this can manifest itself is that it can change who the “last place” candidate is, which could change which ballots are redistributed.
  • IRV is susceptible to strategic nomination.  Meaning that placing a candidate in the race that is incapable of winning, can still alter the outcome.

IRV is currently being run in many places.  Many other countries use it as their primary voting system.  But it is also being used in America.  Many cities elect their mayor through IRV.  Oakland’s 2010 mayoral election ended in one semi-pathological instance of IRV (although there are many more examples abroad).  Ms. Jean Quan was elected mayor, but yet was initially far from first place based on the first place voting.  It took 10 runoff rounds to widdle down the field of 11 candidates until finally Ms. Quan obtained 50% of the votes and was crowned the winner.  Ms. Quan had not lead in any round before the last one.  In fact, after round 9 she had earned just 30.94% of the vote, and in the final round just barely crossed the finish line, ending up with 50.96% of the vote.  The table below is a summary of the results, you can see them in better resolution here.

These facts don’t necessarily mean that the system was unfair or that she did not deserve to get elected, but the complexity of the decision process suggests that some very slight, seemingly reasonable and inconsequential adjustments to the voting system or candidate field could have generated a different result.  And therefore this and the pathological examples above suggest that although IRV is much better than our primary system, maybe it is still not best.

 

And, in my opinion, you’d be right.  I will conclude this post with one more (but almost surely still not the best) voting system.

— Range Voting —

For range voting, each voter is given a table containing each candidate’s name with the numbers 0 through 9 and NO OPINION following each one.  The voter must shade in how much (s)he supports that candidate.  A 0 means that you do not like that candidate at all, a 9 means you really like them.  I will leave it as an exercise to determine what NO OPINION means.  If a candidate is not given a rating, it counts as a NO OPINION. Below is an example of a completed ballot.

So how is a winner decided?  Among all numerical scores that a given candidate received, those scores are averaged.  The candidate with the highest average score is the winner.

Simple enough.  The astute reader may have immediately realized a problem with this system.  What about the random Joe Schmo that no one has ever heard of?    Won’t nearly everyone rate him NO OPINION except himself and his beer-drinking buddies?  Won’t that allow him to get a high average for no good reason?  With the minor candidates getting rated far less often than others, won’t there be an imbalance where some candidate’s friends and family will play a huge influence in helping out their average?  Well, in theory, yes.  But I believe, in practice, no.  If you strongly want your candidate to win (which most people do), there is a good chance you’ll put a 0 down for everyone else, just to make sure.  In practice, I don’t think that this concern would ever come about, especially with any notable races.

So what does this system have going for it?  Well, first of all, it reduces the “spoiler effect”, where similar candidates split their ideological vote, giving the win to a candidate who is not the most popular.  There have been many instances of this in past presidential elections, and certainly many, many more in lower-level American elections and in elections abroad.

Also, observe that in this system, voters do not submit a preference list.  They instead say how much they like or dislike each candidate.  And several candidates can be rated at the same level.  Note that this disqualifies us from applying the Gibbard-Satterthwaite theorem.  That theorem requires preference lists.  Well, fine, but the issue was not whether we could apply the theorem, the problem was the result of the theorem.  So must RV have a dictator, could a candidate be incapable of winning, or is the system susceptible tactical voting?  Nope on all three counts!  The bad things we had said that we wanted to avoid, are indeed avoided!

For these reasons, it is my belief that RV, or some variant of RV (such as dealing with the NO OPINION option in a different way), would make a very good voting system.  If nothing else, it is worth being discussed on the national level to determine if it is indeed as much better of a system as my first impressions think it is.

 

Well, that’s it for this post.  Turned out to be a longer post than I intended.  So thanks if you stuck with it all the way until here!  Stop by tomorrow to hear me tell you just how insignificant you are!

 

Posted in Math in Elections, Math in the "Real World" | Leave a comment

To Converge or Not to Converge – Hilbert Spaces Part 4/Banach Spaces 1

Last time I introduced the notion of the weak topology on a Banach space X. I defined the topology in terms of net convergence, but remarked that this was equivalent to the weak topology generated by X^*. The purpose of this post will be to introduce a bunch of different topologies we can put on a Banach space, its dual, and even the set of linear operators on the Banach space. Before I do this however, I want to spend a bit of time reviewing the notion of the weak topology on a space generated by a collection of functions.

Definition: Let Y be a topological space and X be a set. If \mathcal{F}=\{f_\alpha:X\to Y\} is a collection of functions, then the weak topology generated by \mathcal{F} is the smallest topology on X so that each f_\alpha\in \mathcal{F} is continuous.

I’ll leave it as an exercise to show that this is equivalent to the topology generated by sets of the form f_\alpha^{-1}(U), where U\subseteq Y is open in the topology on Y. Now, suppose X has the weak topology generated by \mathcal{F}. In analysis, it’s often useful to describe a topology in terms of convergent sequences, or in a more general setting in terms of convergent nets. I’ll now establish the following fact:

Proposition: Let X,Y,\mathcal{F} be as in the above definition and endow X with the weak topology generated by \mathcal{F}. Then a net \{x_\beta\} in X converges to x\in X if and only if for every f_\alpha\in\mathcal{F}, the net \{f_\alpha(x_\beta)\} converges to f_\alpha(x) in Y.

Proof: If \{x_\beta\} converges to x, then by the continuity of f_\alpha (for all \alpha), we get the result. Conversely, suppose \{x_\beta\} does not converge to x\in X. I’ll exhibit some f_\alpha\in\mathcal{F} so that f_\alpha(x_\beta) does not converge to f(x). Since x_\beta does not converge to x, there is some basic open set U containing x so that x_\beta is not eventually in U. But now, by the comment above, U=\bigcap_1^n f_i^{-1}(V), where V\subseteq Y is open (technically we should take distinct sets V_i, but if we intersect these sets, the resulting smaller set works just fine). Now, since this is a finite intersection, the condition that x_\beta is not eventually in U implies that there must exist i so that x_\beta is not eventually in f_i^{-1}(V). But this shows that f_i(x_\beta) is not eventually in V, a neighborhood of f_i(x), hence f_i(x_\beta) does not converge to f_i(x).

Okay, so that was a bit of technical, not-very-enlightening proof, but at least it’s there. Now for an example: the most familiar example of a weak topology is the product topology on a Cartesian product of sets, generated by the natural projection maps \pi_\alpha:\prod_{j\in J} X_j\to X_\alpha. Typically in point-set topology we are concerned with the first characterization of the product topology, namely it’s generated by finite intersections of cylinders \pi_\alpha^{-1}(U), for U open in X_\alpha. However, let’s see what happens if we consider this in terms of nets. For a finite product, then this says a net (x_\alpha,y_\alpha) converges to (x,y) if and only if \pi_1(x_\alpha,y_\alpha)=x_\alpha\to x and similarly y_\alpha\to y.

But whatt happens if we view X\times Y as the set of all functions from \{0,1\} to X\cup Y with f(1)\in X and f(2)\in Y? Well, this says that f_\alpha\to f if and only if f_\alpha(1)\to f(1) and f_\alpha(2)\to f(2), in other words it’s a type of pointwise convergence. Suppose X is any topological space, recall that C(X) denotes the continuous functions f:X\to \mathbb{R} (or \mathbb{C}, either works). We can view C(X) as a subspace of the set of all functions from X to \mathbb{R}, i.e. \mathbb{R}^X=\prod_{x\in X}\mathbb{R}. If we give this space the product topology, then a net f_\alpha in C(X) converges to f (which need not be in C(X)!) if and only if \pi_x(f_\alpha)\to\pi_x(f) for each x\in X. But under the natural identification of functions with products of sets, this says that f_\alpha\to f if and only if f_\alpha(x)\to f(x) for all x\in X. Thus the product topology is often referred to as the topology of pointwise convergence. Clearly C(X) need not be a closed subspace of \mathbb{R}^X, since, for example, it’s easy to construct continuous functions on \mathbb{R} that converge pointwise to a discontinuous function.

Now let’s get back to Banach spaces. I’ve already defined the weak topology in a previous post, so now I’ll define the weak* topology on the dual of a Banach space. Of course, since X^* is itself a Banach space, we can consider the weak topology on X^*, i.e. the topology generated by X^{**}. However, a more natural topology to consider is the so called weak* topology. Recall that we can naturally embed X in X^{**} by x\mapsto \hat x, where \hat x(f)=f(x), i.e. \hat x is the so-called evaluation functional. The weak* topology on X^* is the topology generated by the set \{\hat x:x\in X\}. In terms of nets then, a net in X^* converges to f if and only if for all x\in X, we have \hat x(f_\alpha)\to \hat x(f), i.e. if and only if f_\alpha(x)\to f(x) for all x\in X. In other words, this is precisely the topology of pointwise convergence on X^*! In general, the weak* topology is weaker than the weak topology on X^*, though of course if X is reflexive, then the two coincide. As I hinted at in my previous post, the weak* topology is important because in this topology, the unit ball of X^* (i.e. the set of all functionals with operator norm less than or equal to 1) is a compact set. I’ll state and prove this below, but first I just want to mention topologies on operators.

There are a few natural topologies to put on B(X), the collection of bounded (i.e. continuous) operators T:X\to X. As usual, we have the norm topology (using the operator norm), and the topology of pointwise convergence, which is often called the strong operator topology. We can also consider the weak operator topology, which is best defined in terms of nets. We say A_\alpha converges to A in the weak operator topology if and only if for all x\in X, we have that A_\alpha x converges weakly to Ax, so the weak operator topology might best be described as the topology of weak pointwise convergence. Now, without further wait, I give you (with proof adapted from Folland)

Alaoglu’s Theorem: Let X be a Banach space. Then the unit ball of X^* is compact in the weak* topology.

Proof: Let B denote the unit ball of X^*. Given x\in X, define D_x=\{z\in\mathbb{C}:|z|\leq ||x||\}, i.e. to each x\in X we associate a scaled copy of the closed complex unit disk. Let D=\prod_{x\in X} D_x and note that D is compact by Tychonoff’s theorem. Now, D is the collection of all complex valued functions \psi on X so that |\psi(x)|\leq||x|| for all x\in X, and so B is simply the subset of D whose elements are linear. Moreover, as noted above, the product topology on D is the topology of pointwise convergence, which implies that the topology B inherits as a subspace of D is also the topology of pointwise convergence, and as I remarked this is the same as the weak* topology. Thus it suffices to prove that B is a compact subset of D, and since D is compact, it suffices to prove that B is closed. If f_\alpha is a net in B converging to f\in D, all the remains to show is that f is linear. But for any x,y\in X and a,b\in\mathbb{C}, we have

f(\alpha x+\beta y)=\lim f_\alpha(ax+by)=\lim\left(af_\alpha(x)+bf_\alpha(y)\right)=af(x)+bf(y)

and the result is proven.

As you can see, once the proper identifications are made (and Tychonoff’s theorem is applied), this is actually a very clean proof. I’ll end my post, as usual, with an application of this theorem to prove a result that I think is pretty cool. The following is an exercise from Conway that shows, if we want, we can always view Banach spaces as continuous functions on some compact topological space.

Proposition: Let X be a Banach space. Then there exists a compact space K so that X is isometrically isomorphic to a closed subspace of C(K) equipped with the supremum norm.

Proof: Let K=B be the unit ball of X^* with the weak* topology, and identify X with its natural image \hat X in X^{**}, viewed as functions on K simply by restriction of domain. I claim that \hat X\subset C(K) (it’s not immediately clear that \hat x remains a continuous map when X^* is given the weak* topology instead of the norm topology). Fix x\in X. If f_\alpha is a net in K converging to f. In the weak* topology, this simply says f_\alpha\to f pointwise, so for any x\in X, we have f_\alpha(x)\to f(x). But this is precisely the statement that \hat x(f_\alpha)\to\hat x(f), so indeed \hat x\in C(K). Clearly the map x\mapsto \hat x is injective, and we have

||\hat x||_K=\sup_{f\in B}||\hat x(f)||=||\hat x||_{X^{**}}=||x||_{X}

by definition of the operator norm on X^{**} and using the fact that x\mapsto \hat x preserves the norm on X. This shows that f is an isometry (which implies that T^{-1} is also bounded, hence T is actually an isometric isomorphism onto its image). The last thing to prove is that \hat X is actually a closed subspace of C(K). Suppose \hat x_\alpha is a net in C(K) converging to \Psi\in C(K). Then x_\alpha is a Cauchy net in X, so it converges, say to x (it’s a bit of an exercise, but one can show that in an first countable topological vector space, i.e. a vector space with a first countable topology so that addition and scalar multiplication are continuous, which certainly applies to normed spaces, that if every Cauchy sequence converges then every Cauchy net converges as well). But then for every f\in K, we have

\Psi(f)=\lim \hat x_\alpha(f)=\lim f(x_\alpha)=f(\lim x_\alpha)=f(x)=\hat x(f)

so indeed \Psi\in\hat X and the proof is complete.

Actually, one can make the result a bit better. The Banach-Mazur theorem states that if X is a real separable Banach space, then we can take K in the above proposition to be the interval [0,1] and use real valued functions instead of complex valued functions. That’s pretty cool! So, as you can see, the idea of weak topologies is useful and in some sense quite natural when considering normed spaces. Anyways, that’s about all I’ve got for this post, hopefully you enjoyed it.

Posted in Analysis | Leave a comment

Math in Elections Part 4 — Democracy and The Electoral College?

So far in this series we have seen why the Gibbard-Satterthwaite theorem implies that the primaries are unfair. I had said that I was going to talk about alternative voting systems today. But I changed my mind. I’ll instead do that tomorrow. Today let’s move on to the real deal: The General Election.

Most US general elections are not nearly as bad as the primaries because essentially the the Gibbard-Satterthwaite theorem does not apply. One of the assumptions of Gibbard-Satterthwaite is that there be at least three candidates. In the primaries this is certainly true, but in most US general elections, at least 97% of votes cast are for either the Republican or the Democrat nominee. [Grammar break: “Democrat nominee”? “democratic nominee”? “Democrat’s nominee”? “republican” seems naturally to be an adjective while “Democrat” a noun! It makes it confusing!] Ok, back to math. Now, because the two major parties do not get 100% of the votes, the theorem still does technically apply. And the issues that we discussed last time can creep up.

At this moment there certainly are a number of congressional races which have three viable candidates, and hence the situation once again resembles the primaries. And even a presidential election can run into problems. Florida in 2000 is a classic example, and historically there have been many others. That 3% can certainly make a huge difference in a close race if the 3% are primarily being taken away from one of the two major candidates.

But let’s ignore these issues since at best they simply reduce the question to one resembling the primaries, which we have already discussed. Let’s assume that the presidential race is a 2-candidate race and focus our attention on the electoral college. As a preliminary statement, technically the members of each state’s electoral college are not bound to cast their votes for the winner of that state’s popular vote. But we will also ignore that since nowadays that rule is nothing more than than a formality and every time (except a possible rare example here or there, like Florida 2000, once again) they do decide with the people.

First of all, one advantage of the electoral college is that it maintains, to some extent, the notion that our nation is a collection of semi-autonomous states which blah blah blah.. Ok, fine, the founders liked states. But the question I want to discuss here is whether the electoral system is democratic. Americans are usually so proud to claim that we are a democratic nation. But in terms of electing our nation’s president, are we really?

Democracy is usually defined as a system of government where each person has equal say. Of course the founders excluded all sorts of people from this “each person” clause, but whatever. Eventually “land-owning, tax-paying, literate, christian, sufficiently old, non-criminal, white male American citizen” was reduced to “18 year old American citizen,” and Americans (probably at least as loudly as the land-owning, tax-paying, literate, christian, sufficiently old, non-criminal white male American citizens did back in 1800) proudly and loudly proclaim that we have a democracy.

So how do we measure whether “each person has equal say”? I propose the following reformulation: “each person’s vote has equal probability of deciding the election.” If everyone votes, but we dictate that the voters whose last names begin with F, O, or W get to their votes to count 100 times more than everyone else’s, then most everyone would agree that not every person has an equal say. Likewise if the voter’s vote was weighted based on how much land they own or their knowledge of biblical scripture.

But saying “each person’s vote has equal probability of deciding the election” is nice since, first, it is easy to analyze mathematically, but also since it does seem to be a very reasonable way to, in my view, precisely quantify the more philosophical “each person has equal say” statement. It essentially means that each person’s vote is not only counted, but is worth the same in the count.

So I am going to assume from here on that Americans do indeed want to democratically elect their president and that this desire mathematically means that each person wants their vote to have equal probability in deciding the election as any other person’s vote does.

With these assumptions, the electoral college is far from the right system. Consider a school with 35 teachers. For convenience we will refer to this school as Lincoln Southeast High School. Say the principal of Lincoln Southeast High School is holding a meeting in which the teachers must vote on some minor school issue. He plans to take a vote on it and democratically decide with the majority. Suppose 20 teachers support the proposal. Then we all agree that the issue should get passed.

But what if instead he first randomly puts the teachers into groups of 5 and has each group vote on the proposal.  He then chooses to accept the proposal if and only if at least 4 of the 7 groups collectively voted for it? Well, in this case it is possible that although 20 teachers support it, that the proposal will get rejected. This would happen if too many of the supporters of the proposal were clumped together. Then some groups of 5 may narrowly decide to reject the proposal by a 3 to 2 count, while others unanimously accept it. But the margins don’t matter: a 5-0 vote for the proposal in one group is worth just as much as a 3-2 vote against it in another.

But it could get even worse. What if the principal grouped people together by department. The science teachers were put into one group, math in another, special education in a third, etc.. Putting similar people together increases further the chance that some groups will vote in unison.

Similarly, grouping people together based on their geographic location (which state they live in) oftentimes groups like-minded people together and makes it possible for a candidate to win the popular vote but lose the election.

But moreover, it causes certain states to (essentially) be a guaranteed win for the Republicans or a guaranteed win for the Democrats. A (republican or democrat) voter in Texas, therefore, has no chance of his/her vote deciding the election, since his/her vote does not even have a chance of flipping that state. And without any chance of being able to flip the state, the vote is therefore mute. A voter in Ohio or Nevada, though, because of either the criticalness and swinginess of the state or its unique significance in the electoral math, has a much greater chance of his/her vote being the deciding vote in the election.

And that’s even without mentioning that the “number of congressman” rule for deciding how many electoral votes a state gets causes another imbalance in the value of a voter, dependent on which state he/she claims residency in.

Nate Silver, a statistician and writer of the wonderful fivethirtyeight blog, has estimated the relative likelihood of an individual voter casting a vote to decide Tuesday’s election. He concluded that an Ohioan, for instance, has at least a couple hundred times more sway than a Texas voter (and likely even a magnitude or two above that, but Silver is not too specific about how crappy a position the Texan is in. He only upper-bounds the crappiness).

So there you have it. Whether you think that Majority Rules is democratic, or whether some other, more general system is acceptable provided that “each person has equal say”, either way the electoral college ain’t the one you’re looking for.

Now, in closing I will comment that I do not want the electoral college to go away. The election would be less interesting if it were simply a popular vote, I think. Moreover, roughly a third of all Americans live in a “swing state”. So it’s not that there is not a fairly large representative sample of Americans within those states. A poll of 4,000 people is considered very accurate. With at least 40,000,000 voters in the swing states, it in some sense does amount to an awesomely good poll (although a somewhat geographically biased one..). Moreover, the country is too large for a presidential ticket duo to visit every nook and cranny of it. But when you restrict it down to 10 states, maybe those citizens are actually able to attend rallies and town halls with the candidates if they want, receive and digest the onslaught of information, and consequently make a more (but possibly less) educated vote. And the current swing states do tend to have a nice mixture of all types of America in them.

So I am not necessarily saying that the electoral college isn’t fun to have, interesting to see unfold, or pretty often determines the will of the people. But regardless, it is not democratic.

It might also be worth noting that asserting that I know a priori how Texas is going to vote, and concluding the value of a Texan’s vote as a result, may indeed be something that is worth discussing and not just carelessly assuming. Now, I personally think it’s fine to assert what I did, but at least one intelligent person I know has disagreed with me. Whateversuitsyourboat author Zach Norwood is indeed such a person. Perhaps in the next few days Zach will state his case in the comment section below.  But if not, feel free to put extra thought into the validity of that assertion yourself.

Tomorrow I will write about alternative voting systems. Then I will write about how much your vote matters. And finally, on election day, I will conclude this series with a post on the complexity of vote counting.  See you then.

Posted in Math in Elections, Math in the "Real World" | 3 Comments

Math in Elections Part 3 — Delegates, Conventions and Kings.

Last time we talked about some problems with the primaries.  I wanted to mention one issue that can arise at the very end of the primary process because of a seemingly odd combination of convention rules.  The problem presents itself when the delegates convene to cast their votes to officially nominate their party’s candidate.

First consider the following very simplified election where there are three candidates: HotPants, ‘Zam and Z; and three voters:  Jaybert, GloryProne and Nick(i)Name.  Each voter ranked each of the candidates with a 1 for their favorite, 2 for eh, 3 for their worst enemy.  Here are the results:

So at this point, if everyone voted (assuming that we are using any semi-reasonable winner-declaring system), this race would of course end in a three-way tie.  But what if after some snooping around UCSD, Nick(i)Name learns how GloryProne and Jaybert are planning to vote.  Then, although she is unable to get ‘Zam, the true Man that she really wanted, she does have “King (Queen) Maker” powers.  i.e. she is able to choose the winner from among Z and HotPants.  Since Z is her worst enemy, she settles for HotPants (but what’s new, am I right? [jk, HotPants!]).

So the question is, when is there a King Maker?  And are King Makers necessarily a bad thing?

Assume there are three candidates in some heated election.  The election resulted in candidates A and B each getting 40% of the vote, while candidate C received 20% of the vote.  Assume that in order to win, one of the candidates must get 50% + 1 of the vote.  Candidate C has no chance to win in any scenario.  But what if the rules allow candidate C to give his votes to any other candidate.  This may seem like an odd rule, but let’s go ahead with the assumption.

In such a scenario, candidate C now has the ability to effectively choose the winner; C is the King Maker.  In some sense this makes C the “dictator” (as defined in Post 1) in a “new election”, in the following sense.  We can view the first election as resulting in no winner, and provided neither candidate A nor B is willing to concede, there is essentially a second election where there are two candidates (A and B) and just one voter that matters (C).  We have already said that such a system is unfair in the first post of this series.

Now, there are some subtleties going on that make it a little different than the original dictatorial scenario.  If candidate C simply votes for the candidate most similar to himself, then it would seem like the system more or less did choose a winner that is most representative of the people.  But no one expects this last step to be so pure.  You’d have to expect there to be some back room deal made and, because of that, this system seems rather suspicious.

Now, it may come as a surprise that what I just described is exactly how the presidential primaries work.  Each state holds a primary or a caucus.  And based on the results, some candidates are awarded delegates for the concluding party convention.  A candidate must get 50% + 1 of the delegates to be declared the party’s nominee.  However, here’s the catch: a candidate more or less has complete control of how his/her delegates vote.  Of course you would usually expect that candidate to instruct the delegates to vote for himself/herself.  But in a situation like the one described above, a candidate with no chance of winning among a field of candidates all of which failed to reach the 50% mark certainly could play King Maker if the number of their delegates is enough to push at least two of the other candidates past the 50% mark.

This was precisely Ron Paul’s goal this past year.  To collect enough delegates to be a player in the end, and hope that the field narrows down to Romney vs. A Single Conservative Alternative, in which case such a contested-convention scenario has a chance to arise.

The plan did not unfold as he hoped.  But for awhile it was a slightly reasonable possibility.  And so, indeed, this is a possible consequence of our poorly set up primary system that could, in some election cycle, cause many Americans to feel very mistreated by this non-democratic process.

Posted in Math in Elections, Math in the "Real World", Uncategorized | Leave a comment

Math in Elections Part 2 — A Primary Example.

What’s wrong with our current system?  Well, a lot.

But before I get you all depressed, I will mention one thing that our system theoretically does have going for it.

In the last post we mentioned how the Gibbard-Satterthwaite theorem shows that a three+ party system opens up the possibility for significant issues to arise.  However, since America at the moment has just two parties which are major players on the national stage, most general elections are nearly a two-candidate contest.  And consequently at least these elections are likely to avoid the potential unfairness issues that I outlined last time.

Ok, now let’s get sad.  In this post we’ll talk about the Republican and Democrat party’s primaries.  Only voicing your opinion about one candidate has its drawbacks.  A primary is a classic sample case for the Gibbard-Satterthwaite theorem.  In a primary, similar candidates split the vote among the supporters of their ideology.  Others do not vote for their true preference because they view their candidate as being unable to win the primary and instead vote for another to avoid “wasting their vote”.

For instance, say there was a candidate that that was ranked highly on a minority of people’s preference lists, but extremely low on others.  For convenience, let’s call this person, oh I don’t know, let’s go with “Mitt Romney”.  The other candidates, being more similar, split the vote among themselves, allowing Romney to become the presumptive nominee before he ever won a majority of the votes in any state.

If one candidate was ranked 1st on 30% of people’s preference lists but in last place at 7th on the rest, but another candidate is 1st place on 25% of people’s preference lists and in 2nd place on all the other’s, then it might seem reasonable that this latter candidate is actually more representative of society.  But in a 1-candidate voting procedure, this is not considered.

So which of Gibbard-Satterthwaite’s voting conditions do the primaries break?  Based on the constant flow of campaign trail gaffes among the conservatives and Jon Huntsman’s \text{\"{u}ber}-low polling numbers, you might suspect that most of the candidates must be working hard to lose the winnability property.  Or maybe you thought that Sheldon Adelson would attempt to satisfy the dictatorial property.  And yet, at least in theory, neither of those were it.  It was the manipulability property that was actually being broken.

How does this apply to the primaries?  Well, say in some primary that Santorum beat Romney by a small margin (which happened in Iowa this year).  Then, everyone whose voter preferences were Cain then Romney then Santorum could have swung the election by instead voting for Romney then Cain then Santorum.  If so, then although not a single person changed their minds about whether they prefer Santorum to Romney or vice versa, still our voting procedure decided that society did change its mind about preferring Santorum to Romney.

By allowing voters to give the least amount of information possible (only their top choice), and by making it sensible to not vote for who you really prefer, we have indeed set up a very poor voting system.

So what would it look like if we allowed for a more complex voting system where each voter submitted their entire preference ranking?  Well, no matter how we did it, Gibbard and Satterthwaite showed that it would still be unfair in some way.  But it would still almost certainly be better than our current primary system.  Any reasonable one would reduce vote splitting, would improve the landscape for centrists, and likely, at least by some metric, help pick representatives whose views better match those of the electorate.  More on this next time.

In closing, I will give a couple more explicit examples of tactical voting altering or attempting to alter an election.  In this last election cycle Rick Santorum robocalled Michigan Democrats reminding them that they are able to vote in the primary, and many did.  The reasoning from the Democrats’ perspective is that Santorum would be much easier to beat in the general election than Romney.  So liberals showed up to the polls and cast votes for the more conservative candidate, the candidate that least resembled their viewpoints.  In an interview on NPR, one woman said it sent shivers down her spine to cast a vote for Santorum, who she disliked much more than Romney, but she did it because in the end it was what was best for Obama.

Now, although it exhibits the fault in the system, Romney did win in the end.  However the 2003 California Gubernatorial recall election is a very similar example where it is alleged that the result of the election was actually tipped due to tactical voting.  I’ll let you read about that up on your own, though.

Tomorrow I will discuss one more problem with the primary system.  In particular, one involving the delegates and the party conventions.  After that I will suggest some alternatives to our primary system.

Posted in Math in Elections, Math in the "Real World", Uncategorized | 5 Comments

Math in Elections Part 1 — Every Voting System is Flawed.

Hi everyone!  Are you feeling a little tired of the same-old political discussions?  Wishing there could be more math involved?  If so, I’ve got a blog series for you!

In light of the upcoming elections, I thought it’d be a good time to write on The Math of Elections.  In this post I will talk about the Gibbard-Satterthwaite theorem.  Tomorrow I will write about how this theorem illuminates the problems that exist in our primary system.  Following that I will talk about Kings, suggest alternatives to our primary system, discuss the electoral college, analyze whether voting matters, and lastly, on election day, will analyze the complexity of vote counting.  Let’s get started.

The Gibbard-Satterthwaite theorem deals with elections where there are at least 3 candidates running for a single position.  Observe that most elections have these characteristics — for instance most US elections begin with primaries that feature at least three candidates.  Most elections that take place within your community or among your friends would also qualify.  And even in our largest nation-wide general elections, many races feature an independent or third party candidate who will receive a notable number of votes.

Let’s assume that every member of the electorate explicitly ranks their preferences.  They may prefer the Democrat’s candidate most, then the Libertarian, then Herman Cain, then the Republican, then the Green Partier, etc..  From this, we are going to generate society’s preference list in some way, with the most preferred candidate being declared the winner.  The goal, of course, would be to find a way to do so that is fair.    One procedure could be to ignore how people voted and simply place the candidate’s names in alphabetical order, and declare that to be society’s preference ranking.  But, of course, that is not fair.

What about a system like our own?  Our current primary system, for example, takes each voter’s preference list, ignores everything but their top choice, and gives one vote to the candidate who was that top choice.  Then the candidate with the most votes wins.

Is this system fair?  How can we tell?  And if it isn’t, does a fair system even exist?

Well, unfortunately for society, the Gibbard-Satterthwaite theorem roughly says that no voting system with 3 or more candidates, decided by these voter preference lists, is fair.  First let’s talk about what makes a voting system “fair”?  I believe that most everyone would agree that the following properties are ones that a fair voting system should have.

Dictatorial Property.  For starters, a dictatorship does not seem fair.  So let’s say that the first condition that we demand is that there is no dictator;  i.e. the result of the election cannot simply mimic the preference of a single voter.

In Iran, for instance, it appears that many citizens voted for president, but the only vote that mattered was Ahmadinejad’s vote for himself.

Winnability Property.  Second, it must be possible for any candidate to win.  A system where the winner is decided by picking the person whose name comes first in alphabetical order does not seem fair.  We gotta give Zygarri Zzyzzmjac a fighting chance!

Manipulability Property.  Finally, it should not be to one’s advantage to engage in tactical voting.  Meaning, it should never be possible for someone to submit false preferences because doing so will help their true preference win.

So, for example, suppose that at the moment the Societal Ranking Procedure (whatever it may be) reports that society likes the Republican candidate the most, then the Democrat, then the Green Partier, then Herman Cain, then the Libertarian.  If I personally have the ranking of Republican then Democrat then Herman Cain then Green then Libertarian, but I decide to interchange Democrat and Libertarian or Libertarian and Herman Cain, then that should not change the fact that society prefers the Republican to the Democrat.  It would make no sense if suddenly the Democrats candidate jumped into first place.  More to the point: no one changed their preferences with respect to the Democrats and the Republicans, so it does not make sense that society should have changed its preference either.

So those are three conditions which all seem fairly reasonable ones to ask of a voting system.  However, Gibbard and Satterthwaite showed that these three conditions can not all happen simultaneously.  Given any voting system with at least 3 candidates, where the winner is deterministically chosen as a function of these voter preference lists, either your system has a dictator, or some candidate is incapable of winning, or there is a way to manipulate the system by being dishonest.  How sad.

I will omit the proof of this theorem in order to keep this series accessible to a large audience.  Possibly some day I will come back and write a post proving it.  But that day is not today.

This series will continue tomorrow when I will talk more about this theorem in the context of one unfair electoral system: the US primaries.

Posted in Math in Elections, Math in the "Real World" | 4 Comments

Moving Around a Sphere – Hilbert Spaces Part 3

Well, it’s been a long time since I’ve written a post, but I have a bit of time tonight and I really don’t feel like doing homework tonight, so let’s talk about some mathematics! Last time I wrote, I promised some discussion of the unit ball in infinite dimensional Hilbert spaces. Many of the definitions and even some of the facts I’ll state carry over to arbitrary Banach spaces, but some of the proofs are a bit easier with the inner product structure, so I’ll stick to the nicer Hilbert space case. Here we go:

Last time I remarked that if H is a finite dimensional Hilbert space, then the unit ball of H is compact in the norm topology. The proof of this is a bit messy, so I won’t give it, but I’ll sketch the basic idea, which is straightforward: in \mathbb{R}^n this is obviously true, since the unit ball is closed and bounded. Given any finite dimensional Hilbert space, it is isometrically isomorphic to \mathbb{R}^{\dim{H}}. Under this isomorphism, the unit ball of \mathbb{R}^{\dim{H}} maps onto the unit ball of H, and so the unit ball of H is the continuous image of a compact set, so is itself compact. The messy part is writing down the correct isomorphism, but that’s left to the reader. It’s a fun exercise to use this fact to prove that any two norms on a finite dimensional Hilbert space are equivalent: i.e. if ||\cdot||_1,||\cdot||_2 are two norms on H, there exist non-zero, positive constants A,B so that A||x||_1\leq ||x||_2\leq B||x_1|| for all x\in H, and as an easy corollary to this we see that any two norms generate the same topology on H.

Now let’s suppose H is infinite dimensional. I’ll leave H is an arbitrary infinite dimensional case, but if you want a concrete example, just take H=l^2(\mathbb{N}), the set of square summable sequences of complex (or real) numbers with the natural inner product. Let \{e_\alpha\}_{\alpha\in J} be an orthonormal basis for H. For l^2, the easiest example is to take e_i=(0,0,\dots,1,0,\dots), where the 1 occurs in the i‘th slot. For arbitrary H, let \{e_i\}_1^\infty be a countable subset of the orthonormal basis. Now, for i\neq j, we have

||e_i-e_j||^2=\langle e_i-e_j,e_i-e_j\rangle=||e_i||^2+||e_j||^2=2

so the set \{e_i\}_1^\infty is a \sqrt{2}-separated sequence sitting in the unit ball, and so it has no convergent sub-sequence. For a metric spaces, sequential compactness and compactness are equivalent, so this shows the unit ball is not compact.

The fact that the unit ball of a Hilbert space is not compact is unfortunate. We like compact spaces, they tend to make life easier! One thing we might try to do is define a topology on H where the unit ball is compact. In fact, we can do something similar to this for any Banach space. Actually, we have to work in the dual space for an arbitrary Banach space, but since Hilbert spaces are naturally (anti)isomorphic to their dual via the Riesz representation theorem, we can think of it as a topology on H. For an arbitrary Banach space, this topology will be called the weak* topology (read “weak star”) on X^*. In this post, I’ll define the so called weak topology on X. In the Hilbert space case, these two will coincide, but for an arbitrary Banach space they are different topologies on two distinct spaces. Finally, I’ll prove a cool fact about the weak topology on a Hilbert space. My next post will contain the definition of the weak* topology and Alaoglu’s theorem, which states that the unit ball in X^* is weak* compact.

Definition: Let X be a Banach space. The weak topology on X is defined as follows: a net \{x_\alpha\} converges to x if and only if for all f\in X^*, the net f(x_\alpha) converges to f(x).

Remark: The weak topology depends on the norm on X, it’s not intrinsic to the set X. We use the norm to define the continuous linear functionals that make up X^*, and in turn use these to define the weak topology on X. Also, note that we could have defined the weak topology to be the weak topology generated by X^*, i.e. the weakest (smallest, coarsest) topology on X so that every f\in X^* is continuous. In particular, since the norm topology on X satisfies this condition, it follows immediately that the weak topology on X is in fact weaker than the norm topology: i.e. ||x_n-x||\to 0 implies x_n converges to x in the weak topology (in this case we often we say x_n converges to x weakly).

Now, moving back to the world of Hilbert spaces, what does it mean for a sequence (technically a net, but we’ll just think sequences for the sake of simplicity) to converge weakly in H? Well, recall that by the Riesz Representation theorem, any bounded linear functional f\in H^* is given by f(x)=\langle x,y_f\rangle for some fixed vector y_f\in H (and conversely, the map x\mapsto\langle x,y\rangle defines a bounded linear functional on H). Translating this to weak convergence, we have x_n\to x weakly in H if and only if for all y\in H, we have \langle x_n,y\rangle\to \langle x,y\rangle. Thus for Hilbert spaces, we can think of weak convergence as “convergence of inner products”. One might ask if this is actually a genuinely weaker topology than the norm topology on H. That is, can we find a sequence x_n that converges weakly in H, but does not converge in the norm topology? I’ll end my post by proving the following cool

Proposition: Let \{e_i\}_1^\infty be an orthonormal set in H. Then \{e_i\} converges weakly to 0. Moreover, \{e_n\}_1^\infty does not converge in the norm topology on H.

Proof: The statement about norm convergence was shown above, so we need only prove the statement about weak convergence. To this end, let y\in H. By Bessel’s inequality (or, if you want, extend \{e_i\} to an ONB and apply Parseval’s identity), we have

\sum_{i=1}^\infty |\langle y,e_i\rangle|^2\leq ||y||<\infty

and so in particular the infinite sum converges. But this is just a sum of real number, so we must have that the terms of the sequence go to zero as i\to \infty. But this is precisely the statement that \lim_{i\to\infty} |\langle y,e_i\rangle|^2=0. It is left to the reader to verify that this implies \langle e_i,y\rangle\to 0 as i\to\infty, i.e. e_i\to 0 weakly in H.

Posted in Analysis, Set theory | Leave a comment

AC, part III: Choice in context

Wow, it’s been a long time since my last post. Sorry about that. I intend (though I’ll make no promises) to post a bit more regularly now.

This final post in my ac series was advertised to be a post about surprising uses of choice, but it’s grown to be a general discussion of ac.

— 1. Important principles weaker than full ac

To start developing a picture of how ac relates to other mathematical assertions, we need to describe some so-called choice principles that are not provable in zf but are weaker than ac. (See the previous two posts in this series (I and II) for some assertions equivalent to ac.)

One of the most natural ways to weaken ac is to limit the number of choices you’re making or the size of the bins you’re choosing from. By this sort of weakening we come to one of the mildest choice principles, the Axiom of Countable Choice:

Definition 1. The Axiom of Countable Choice (acω) asserts that every countable family of nonempty sets has a choice function. (That is, if {\emptyset\notin X} and {X} is countable, then there is a function that assigns each member {x} of {X} a member of {x}.)

acω is substantially weaker than ac, but it’s sufficient to forbid many types of unpleasant sets. For example, it is a theorem of zf+acω that every infinite set has a countable subset, though this isn’t provable in zf alone (that is, there is a model of zf in which there is an infinite set with no countable subset). Many elementary facts about the real numbers can be established in zf+acω; for example, only zf+acω is necessary to prove that a countable union of countable sets is countable (though, again, some form of choice is necessary to prove this). In particular, in the presence of acω, {{\mathbb R}} is not a countable union of countable sets. (Apparently the assertion the countable union of countable sets is countable is actually strictly weaker than acω. I think I once found a reference for this.)

We need slightly more firepower to do real analysis properly, though. Proofs in (baby) real analysis often require the construction of a sequence, which is typically built inductively by choosing the {n}th term from, e.g., a ball of radius {1/2^n}. The proof of the Baire Category Theorem is a good example of this style of argument, which uses the Axiom of Dependent Choice:

Definition 2. The Axiom of Dependent Choice (dc) is the following assertion: If {R} is a relation on a nonempty set {X} such that for every {x\in X} there is a {y\in X} such that {xRy}, then there is a sequence {(x_n : n\in{\mathbb N})}, of members of {X} such that {x_nRx_{n+1}} for every {n\in{\mathbb N}}.

Apparently this paper gives a proof that the Baire Category Theorem is equivalent (in zf, of course) to dc.

It is straightforward to show that dc implies acω (exercise!). There are many choice principles (like dc and acω) that guarantee only the existence of choice functions for sets satisfying certain size restrictions. (acω restricts the size of the family; other useful choice principles restrict the size of each member of the family.) dc is sufficient to prove the results of basic analysis and is particularly appealing because it is too weak to prove that nasty things like nonmeasurable sets and Banach–Tarski decompositions of the sphere exist. (See section 2 below.)

Notice that acω and dc are local principles, in the sense that they are chiefly concerned with sets of cardinality near {\aleph_0} (this is less true of dc). For this reason, it shouldn’t be surprising that each of them is strictly weaker than full ac. Applications outside analysis often require global principles (for sets or spaces of arbitrary cardinality, e.g.), and a popular global choice principle weaker than full ac is the Ultrafilter Lemma.

Definition 3. The Ultrafilter Lemma (ul) asserts that every filter on a set can be extended to an ultrafilter.

The Prime Ideal Theorem (pit) asserts that every boolean algebra has a prime ideal, an ideal {I} such that {x} or {-x} is a member of {I} for every {x}.

Conveniently, the ul is equivalent to other interesting choice principles.

Theorem 4. In zf, the following are equivalent:

  1. The Ultrafilter Lemma;
  2. The Prime Ideal Theorem;
  3. The Compactness Theorem for first-order logic.

The proof of this theorem isn’t too difficult, so I won’t provide it here. We are taking on faith that the ul is strictly weaker than ac, but it would be nice to have a sense of just how much weaker it is. We have several different equivalent formulations of ac, each about a different sort of mathematical object:

Theorem 5. In zf, the following are equivalent:

  1. ac;
  2. Every set can be wellordered;
  3. Tychonoff’s Theorem: a product of compact spaces is compact;
  4. Every vector space has a basis.

(By the way, Exercise: prove that Tychonoff’s theorem implies ac. The proof I know of is short but requires a trick.)

By transferring choice principles over to new contexts (like topology) using the ideas from the proof that the assertions above are equivalent, we can provide a satisfying picture of how much weaker the ul is than ac.

Theorem 6. In zf, the following are equivalent:

  1. the ul;
  2. Every product of compact Hausdorff spaces is compact;
  3. Every product of 2 is compact.

[Edit (Nov 10 2012): the ul is not equivalent to the assertion that every set can be linearly ordered or ac with finite `bins’; in fact,

ul \Longrightarrow every set can be linearly ordered \Longrightarrow ac with finite bins,

but none of those implications is reversible. It has recently come to my attention (I believe it’s proved in this paper) that the ul is equivalent to Tychonoff’s Theorem for sober spaces.]

Exercise: (Since so many of our authors and readers are analysts) The pit implies Hahn–Banach. (This is apparently difficult, so comment if you think you have a solution. But don’t give it away; I haven’t thought about it much yet.)

— 2. Math(s) without choice —

I’ll say a few words about two results whose proofs don’t require choice, the Cantor–Schroeder–Bernstein Theorem and Ramsey’s Theorem.

That the csb doesn’t require choice is surprising; its conclusion asserts the existence of a function constructed rather mysteriously from the functions in its hypothesis. Of course, it’s easy to say ‘Obviously it doesn’t require choice: here’s a proof that doesn’t use it!’, but when you suppress the intuition you gain by understanding the proof of the csb, the surprise returns. One reason to expect this surprise is that the easiest proof of the csb in zfc (namely, reduce to the case when your sets are wellordered by appealing to ac) makes unavoidable use of choice. And csb provides a mysterious function, which is what ac usually does.

The dual csb theorem, which asserts that if there are surjections {X\rightarrow Y} and {Y\rightarrow X} then there is a bijection {X\rightarrow Y} (and is a theorem of zfc), is not provable in zf. Apparently it is an open question whether the dual csb implies ac.

Maybe it’s not so surprising that Ramsey’s Theorem doesn’t require ac, but it can be made into a nontrivial choice principle by recasting it as ‘For every {2}-coloring of the {n}-subsets of an infinite set {X}, there is an infinite monochromatic subset {Y} of {X}.’ (‘But can’t you just prove this from the normal statement of Ramsey’s Theorem by restricting to a countable subset of {X}?’, you ask? Remember that it is not provable in zf that every infinite set has a countable subset!)

One final note: Solovay proved that it is consistent with zf+dc that every subset of {{\mathbb R}} is (Lebesgue-)measurable. So we can do analysis somewhat familiarly in a setting where all of the standard theorems (Heine–Borel, etc.) are true, but pathological nasties like nonmeasurable sets don’t exist. What does such analysis look like? I don’t know, but I’d like to.

— 3. Obviously true? —

The temptation at this point, I think, is to point out theorems we are unable to prove without full ac, and then argue that any theory strictly weaker than zfc provides a somehow deficient concept of set, since it is unable to prove those theorems. My objection to this attitude is that there is a lot of choiceless mathematics to be understood (and alternative extensions of zf (e.g., by the Axiom of Determinacy) that are capable of proving quite powerful, interesting theorems) and we simply gain nothing by asserting (or believing or whatever) that some set of axioms is the gods-given ‘correct’ one and that we should blindly stick to it (in the same way, more absurdly, that a commutative algebraist gains nothing by asserting with religious fervor that noncommutative rings simply don’t exist, in whatever metaphysical sense). Anyway, this is a standard platonism-versus-pluralism issue, and philosophers of mathematics spend a lot of time thinking about it.

— 4. The ‘magic wand’ factor —

Another reason to be careful about (though not necessarily avoid altogether) use of ac is that it is notorious for clouding the ideas in proofs. For example, consider the standard proof in zfc that every countable union of countable sets is countable: Take a countable set of sets {A_n}, choose (by acω) an enumeration of each one, and zigzag through the enumerations just as in the proof that {{\mathbb N}\times{\mathbb N}} is countable. Contrast that statement and proof to the following:

Theorem 7. The counted union of counted sets is counted. That is, if {A_1,A_2,\dots} are countably many sets, and for each {n} the function {f_n\colon A_n\rightarrow {\mathbb N}} is a bijection, then there is a bijection {g\colon \bigcup_{n\in{\mathbb N}} A_n\rightarrow {\mathbb N}} (and we can write down a formula for it in terms of the {f_n}).

And, of course, the proof is the same as above, except that the enumerations are already chosen for us. My point is that the standard statement and proof hide the real idea, which is the construction of {g} from the {f_n}. If you want the stronger result, just take this theorem and snort the necessary amount of choice. Separating the choice-snorting from the real work gives us a deeper understanding of the result and its proof. Assuming ac carelessly is in the same spirit—though hardly as ridiculous—as assuming {0=1} would be: you can prove many things, but without understanding your subject as deeply as you would otherwise. (I should mention that I owe this example, related insight, and the expression ‘snorting choice’ to Thomas Forster.)

Recently I encountered another similar example, which I’ll quickly share. I mentioned earlier that the Baire Category Theorem is a theorem of zf+dc. The BCT for separable completely metrizable spaces is actually a theorem of zf. (This isn’t hard to prove, so you should do it if you’re interested. The point is that the countable dense set allows you to make your choices canonical, since you can always take from the countable dense set the lowest-indexed point that does what you need it to do. I think this is a standard trick for proving things about topological spaces without ac, but I don’t know of another example.) I don’t know of a way to get to full BCT from BCT for separable spaces by invoking dc, though, so this isn’t as good an example as the previous one.

— 5. Coming attractions —

I’ve recently gotten excited about infinitary combinatorics, so that will probably be the topic of my next major series. Combinatorial set theory should also show up soon, as will some set-theoretic topology.

I’ll almost certainly write about toposes [sic!] someday, too, but not until after I’ve written a post or two about general category theory.

In other news, Jay has promised to write a post in the next ten days, so stay tuned! P.S. This post is the first I’ve written using Luca Trevisan’s fantastic latex2wp. I recommend it to all math WordPress bloggers! (Quick test: Does WordPress render Erdős correctly? Yes.) Blue, no green!

Two standard references, from which most of this material comes:

Jech, Thomas J. The Axiom of Choice. 2008 Dover edition, originally published as: Studies in Logic and the Foundations of Mathematics, Vol. 75. North-Holland Publishing Co., Amsterdam-London; American Elsevier Publishing Co., Inc., New York, 1973. xi+202 pp. (MR0396271)

Howard, Paul; Rubin, Jean E. Consequences of the axiom of choice. Mathematical Surveys and Monographs, 59. American Mathematical Society, Providence, RI, 1998. viii+432 pp. (MR1637107)

Posted in AC, Logic, Set theory | 6 Comments