## A quick, cool result to distract you from finals

If you are like me, you’ve probably got lots to do for final exams and things of the like as the semester (trimester, quarter, whatever your school uses) comes to a close. In an effort to take a bit of break, I would like to present you with a cute little result I came across in my studies. It’s a nice little number theory fact with a cute combinatorial proof. I found the result in Hatcher’s book on algebraic topology, and I’ll present an adapted version of his proof (though I’m sure it’s in many other books).

First we need to remember a little about $p$-adic representations of integers. If $n$ is any integer (technically we should assume $n\geq0$, it doesn’t really matter), and if $p$ is a prime number, then $n$ can be written (uniquely) as $n=\sum_{l} n_lp^l$, where $0\leq n_l are integers. This is easy to prove using the fundamental theorem of arithmetic and the division algorithm. Without further adieu I give you the following

Proposition: Let $p$ be a prime number and $n,k\in\mathbb{N}$. Then

$\binom{n}{k}=\prod_l\binom{n_l}{k_l}\mod p$

where $n=\sum_l n_lp^l$ and $k=\sum_l k_lp^l$ are the $p$-adic representations of $n$ and $k$.
Proof: Recall that in the polynomial ring $\mathbb{Z_p}[x]$, we have the identity $(1+x)^{p^j}=1+x^{p^j}$. Working modulo $p$, we have

$(1+x)^n=(1+x)^{n_0}(1+x^p)^{n_1}(1+x^{p^2})^{n_2}\dots$

$=\left(1+\binom{n_0}{1}x+\binom{n_0}{2}x^2+\dots+\binom{n_0}{p-1}x^{p-1}\right)\times$

$\left(1+\binom{n_1}{1}x^p+\binom{n_1}{2}x^{2p}+\dots+\binom{n_1}{p-1}x^{(p-1)p}\right)\times\dots$

On one hand, we know that the coefficient of $x^k$ in this expansion is $\binom{n}{k}$. On the other hand, if we expand the product on the right, it’s easy to see that no two terms in the product will have the same power of $x^j$. Thus, to compute the coefficient of $x^k$ in the expansion, we need to choose $x^{l_i}$ from each factor so that $\sum_i l_i=k$. But we also know that in the $i$‘th factor, we have terms of the for $j_ip^i$. In other words, we simply pick $l_i$ so that $\sum_l j_lp^l=k$, which is exactly the $p$-adic representation of $k$. The coefficient on each of these terms if precisely $\binom{n_i}{k_i}$, and so taking the product gives the result.

And that’s it! I hope you enjoyed it, and now I’ve got to get back to work!

Posted on by | 1 Comment

## Math in Elections Part 7 — The Complexity of Declaring a Victor.

Alright, folks.  Today’s the day.  Today the country votes on who will represent us in government.  But more importantly for the news organizations, today is the day when they get to hype the heck out of every exit poll, every demographically significant neighborhood, every weather report, and, of course, every new set of raw vote data.

And if the polls thus far are any indication, it will be late into the evening or early into tomorrow morning until we learn about the results for the day’s biggest race: Ohio (AKA: the presidency).

So how are votes counted and is that indeed the most efficient way possible?  What is the complexity of determining the winner of each state?

In a presidential election, each state will have many people on the ballot.  Of course Obama and Romney will be there, but Libertarian Gary Johnson will be on the ballot in every state and in DC except possibly Michigan and Oklahoma, where his ballot access has been challenged.  The Green Party’s Jill Stein will be on the ballot in 38 states and in DC.  The Constitution Party and Justice Party will also be on in a handful of states.  And in each state, many more may be on just for that specific state.  But more importantly, most states have a write-in option where you can write in a vote for anyone or anything you like; such as yourself or Daffy Duck or Whateversuitsyourboat or Rex Burkhead or Herman Cain.

Nebraska, for instance, has a write-in option.  And any name is acceptable for submission.  Meaning there is no (finite) bound on the number of possible write-in submissions.

Moreover, let’s assume for the sake of this problem that, although there may be a large collection of write-in submissions for who-knows-whom, that in the end, one of the candidates will get at least 50% of the vote.  (This is not a strong assumption for many elections since oftentimes, unless a candidate gets 50% of the vote, there is a runoff election between the two leading vote-getters anyways.  And hence there would be another election where getting 50% is actually required.)  But for the sake of the algorithm, we cannot and should not narrow down to a finite list who we think will win.  We may not immediately say that Obama or Romney will win, because that would put an immediate bias against Whateversuitsyourboat, which is just not right.

Lastly, for this problem we may even assume more broadly that the tally is “on-line”; meaning the votes are coming in one at a time, and we are viewing them only as they come in.  So we don’t receive them as a pre-arranged set or know how many will eventually be cast.

Ok, so that’s the setup.  Now here’s the question:  How many operations must be done in order to determine which candidate got 50% of the vote?  Can we find a strategy that is fast and is not dependent on the number of candidates submitted?

Say the candidates are enumerated 1 through $k$, but we don’t know how large $k$ is.  One option is to create an array.  Each time a candidate’s name comes in, we search the array to determine if (s)he is already listed.  If so, we increase his/her tally by one.  If not, then we create a new spot for that candidate and set his/her initial tally equal to 1.  And then, at the end of all of this, once every vote has come in, we search through this list and find the largest value.

However, this process is both not nearly as fast as possible and the last step is highly dependent on $k$.  If $k$ is large, then determining the winner at the end could take many operations.

I claim there is an algorithm that will deterministically take the, say, $n$ votes off the line and spit out the winner in at most $n$ operations.

Note that we did not use in our last algorithm that one candidate will get 50% of the vote, so we definitely want to incorporate that in somewhere.  Also note that in the end, we do not care to know the exact vote tallies of all the losers, or even the winner.  So keeping track of them seems like we were doing more work than necessary.  This is a fun problem to think about.  So I encourage you to spend some time mulling it over on your way to the polling booth today, before you come back and read the solution.  But if you wish to know the answer right off the bat, well that’s ok too.  Here it is.

As your votes come in, put them in a list.  Proceed through the list keeping track of two things: one candidate’s name and one number.  Say the sequence started like this:

$A \ \ A \ \ A \ \ C \ \ B \ \ B \ \ B \ \ C \ \ C \ \ C \ \ B \ \ C \ \ C$

So after thirteen votes, 3 candidates have thus far been recorded.  After the first vote, the pair is $(A : 1)$. This is because the first vote was for $A$, but more importantly it means that since the beginning (when no one had a majority), $A$ has had the majority of the votes, and he is in fact 1 vote above the majority (if he lost 1 more vote he would lose the majority).  After step two the pair is $(A : 2)$, and after the third step it is $(A : 3)$.  So, again, since the starting position when no one was winning, $A$ has had the majority and it would take three non-$A$ votes for him to lose it.

The first non-$A$ vote happens next.  After the $C$ the count is at $(A : 2)$, then it goes to $(A : 1)$ after the first $B$.  Note: another way to look at the count at this point is that the number of $A$s at this point is one larger than the sum of the numbers of everything else.  Proceeding from here, after the second $B$ it is $(? : 0)$, meaning at this point no candidate has a majority of the votes.  After the next step, though, we are going to dictate that the the count be $(B : 1)$.  Note that this pair does not mean that $B$ is one vote above the majority in the entire string thus far.  It only means that ever since the $(? : 0)$ point when no candidate had a majority, $B$ is winning.  Continuing in this way, the counts starting from here are

$(B : 1) \to (? : 0) \to (C : 1) \to (C : 2) \to (C : 1) \to (C : 2) \to (C : 3).$

Our conclusion from this is that $C$ must have won.  But why?  We noted that this $(C : 3)$ only means that since the last no-winner point, $C$ has been up by 3 against the majority.  It does not mean, however, that $C$ is 3 above the majority among all votes cast so far.  How do we know that we didn’t run up the tally on $A$ and $B$ early on, causing one of those to be ahead of $C$?

Here’s one way to see it.  Note that certainly $A$ is not the winner, since when he lost the majority, he never gained it back.  $B$ is not the winner for the same reason.  But wait!  We supposed that one of the candidates was going to get at least 50% of the vote.  Therefore, with that supposition and us ruling out $A$ and $B$ as winners, the winner must be $C$.

And how many steps did our algorithm take?  Only one step for each vote.  So with $n$ votes, we require a mere $n$ steps to determine the winner.

Well, that’s it for this series.  I intend to post more often, but you know how that goes.  I stopped mid-series last spring.  Maybe I will pick that up again.  Or maybe I won’t.  Who knows.  Luckily Hotovy is back with some good analysis posts, Zach claims that he’ll post more this year, and Adam is planning on writing on the Noble Prize in Economics soon.  Lots to look forward to!  Of course, even with our joined forces we still can’t even begin to compete with the posting rate over at Andy Soffer’s wonderful blog. But hey, we’re only human.

Until next time!  Now go watch CNN, all you political junkies!

## Math in Elections Part 5 — There are Alternatives!

“There have been 45 presidential elections since 1828. In at least five, the race went to the second-most-popular candidate because of a spoiler. That’s an 11 percent rate of catastrophic failure. Were the plurality vote a car or an airliner, it would be recognized for what it is — a defective consumer product, unsafe at any speed.”

— William Poundstone, Gaming The Votes

There is an argument to be made that Mr. Poundstone is right that the US voting system has failed profoundly at the biggest level a very significant number of times.  You can also make the argument that if you increase your sample size to include elections from all around the world with a similar system to our own, that the system fails with even greater likelihood.  But I won’t dwell on arguing this point.  We have talked about the issues with our primary and general election systems.  Now let’s talk about some alternatives.

In this post I want to talk about some alternative voting systems when there are several influential candidates in the field.  This includes the presidential primaries, PTA elections, governing board elections at your church, and all other emotional affairs where a voter could be drawn to tactical voting if necessary.

— State-Based Alternatives —

Let’s start with how we elect our president.  First, if we still want to follow the states-focused path of America’s (non-mathematician) founding fathers, then one state-based alternative is a weighted popular vote, where the weight of a vote is determined by which state it comes from.  One way to implement this is to let each state still have its fixed number of contributing electoral votes, but the votes are awarded proportionally to the candidates.  That still allows each American voter to be heard, yet would cap the influence of a state, since each state would have the same number of contributing electoral votes regardless of how many of its voters turned out on election day.  I don’t think this is a great replacement, but I do think it is an improvement.

I will note that two states, the first being The Great State Of Nebraska, and the second being Maine, have a system similar to this.  They award their electoral votes by congressional district, not as a whole state.  This caused Obama to win one of the five electoral votes from the very red state of Nebraska (second most consistently red state in the union over the past half century).  This was the furthest Obama reached into red territory in 2008.  The problem with this process, though, is seen in Nebraska GOP’s response to Obama winning this electoral vote.  They responded by redrawing district lines.  They shifted democratic neighborhoods from the second district, which Obama won, into the more republican first district, which was able to safely absorb these voters without risk of turning blue.

This process of strategically bunching together voters of opposing party identification to yourself in order to contain their influence is called Gerrymandering.  It leads to ridiculously shaped districts like North Carolina’s 12th congressional district, seen below.

State lines, however, will not be changed.  Therefore to avoid these blatantly non-democratic procedures, I believe that it is best to divide up the electoral votes dependent upon state vote tallies, not the tallies from these fluid congressional districts.

Govtrack.us has a nice feature where you can scroll through time to see how congressional districts have changed.  It’s pretty ridiculous.

A final method that maintains the state-focused viewpoint would be to have a dual system where half the vote was determined by some state-based system (like the electoral college, but ideally a better one), and the other half by a popular vote.  Our Congress is divided up into two houses, one with each representative speaking for roughly the same number or citizens, and the other with a pair of senators representing a single state.  So it would be consistent with the founder’s vision in that regard to have such a balance between the people and the states when electing our president.

Now I want to talk about two voting systems that can be used in any election setting.  They’d work well for the primaries, for instance.  These are Instant Runoff Voting (IRV), and Range Voting (RV).  I prefer the latter to the former.  And so I will save that for my grand finale.  Let’s start with IRV.

— Instant Runoff Voting —

The idea behind IRV is straightforward.  Each voter gets a list of all the candidates and ranks any of them that (s)he wishes to.  Like so.

We all agree that if a candidate is the top choice for over half of all citizens, then that candidate deserves to be elected.  Therefore, if a candidate is the number 1 choice for over half of the voters, then IRV dictates that that candidate is the winner.  However, if there is no such candidate, then we might agree that there should be some sort of runoff election.  IRV is then designed to immediately simulate a runoff election by people’s other preferences.

After the first round, we create a ranking of all of the candidates, as determined by the number of first place votes they received.  Since we have all this voter data provided to us, let’s talk about what is the best possible way to simulate a runoff election.  It might not make sense to eliminate everyone except the top two (like many runoff elections), since possibly all of the top three got very similar results, and maybe even the third-place candidate is ranked as people’s number 2 choice far more often than the first or second finishing candidates.

So let’s be safe and only eliminate from the race the candidate that finished in last place.  Then, let’s redistribute his/her votes according to the next preference on each ballot that placed him/her in first place.  Then we check again whether we have a candidate who has surpassed 50% of the vote.  If not, we repeat.  The flow chart looks like this:

We continue this process until we have a winner.  The algorithm (with some tiebreak rules) will always give us a single winner.

This system has many advantages.  It’s criteria for declaring a winner is a pretty good one.  And having many runoff elections (but without paying to keep running them, losing voter turnout, etc.), and only eliminating the worst candidate each time, seems like a good idea.  One possible point for discussion is whether having the least number of first place votes does qualify you to be “the worst”.  There could very well be many other more sensible metrics for determining which candidate is the worst representation of society.

There are, however, some unfortunate consequences of IRV.  The way votes are redistributed allows for some pathological behavior.  Here are some examples.

• It is possible for a candidate to be capable of beating every other candidate in a head-to-head election, but yet fail to win IRV.
• If you were to divide the electorate into two groups, and run the election on both groups, it is possible for both halves to elect the same candidate but yet the election being run on the entire electorate would select a different candidate.  (All preferential voting systems are susceptible to this pathology.)
• It is possible that placing a candidate higher on your preference list will actually hurt his/her chances of winning.  One way this can manifest itself is that it can change who the “last place” candidate is, which could change which ballots are redistributed.
• IRV is susceptible to strategic nomination.  Meaning that placing a candidate in the race that is incapable of winning, can still alter the outcome.

IRV is currently being run in many places.  Many other countries use it as their primary voting system.  But it is also being used in America.  Many cities elect their mayor through IRV.  Oakland’s 2010 mayoral election ended in one semi-pathological instance of IRV (although there are many more examples abroad).  Ms. Jean Quan was elected mayor, but yet was initially far from first place based on the first place voting.  It took 10 runoff rounds to widdle down the field of 11 candidates until finally Ms. Quan obtained 50% of the votes and was crowned the winner.  Ms. Quan had not lead in any round before the last one.  In fact, after round 9 she had earned just 30.94% of the vote, and in the final round just barely crossed the finish line, ending up with 50.96% of the vote.  The table below is a summary of the results, you can see them in better resolution here.

These facts don’t necessarily mean that the system was unfair or that she did not deserve to get elected, but the complexity of the decision process suggests that some very slight, seemingly reasonable and inconsequential adjustments to the voting system or candidate field could have generated a different result.  And therefore this and the pathological examples above suggest that although IRV is much better than our primary system, maybe it is still not best.

And, in my opinion, you’d be right.  I will conclude this post with one more (but almost surely still not the best) voting system.

— Range Voting —

For range voting, each voter is given a table containing each candidate’s name with the numbers 0 through 9 and NO OPINION following each one.  The voter must shade in how much (s)he supports that candidate.  A 0 means that you do not like that candidate at all, a 9 means you really like them.  I will leave it as an exercise to determine what NO OPINION means.  If a candidate is not given a rating, it counts as a NO OPINION. Below is an example of a completed ballot.

So how is a winner decided?  Among all numerical scores that a given candidate received, those scores are averaged.  The candidate with the highest average score is the winner.

Simple enough.  The astute reader may have immediately realized a problem with this system.  What about the random Joe Schmo that no one has ever heard of?    Won’t nearly everyone rate him NO OPINION except himself and his beer-drinking buddies?  Won’t that allow him to get a high average for no good reason?  With the minor candidates getting rated far less often than others, won’t there be an imbalance where some candidate’s friends and family will play a huge influence in helping out their average?  Well, in theory, yes.  But I believe, in practice, no.  If you strongly want your candidate to win (which most people do), there is a good chance you’ll put a 0 down for everyone else, just to make sure.  In practice, I don’t think that this concern would ever come about, especially with any notable races.

So what does this system have going for it?  Well, first of all, it reduces the “spoiler effect”, where similar candidates split their ideological vote, giving the win to a candidate who is not the most popular.  There have been many instances of this in past presidential elections, and certainly many, many more in lower-level American elections and in elections abroad.

Also, observe that in this system, voters do not submit a preference list.  They instead say how much they like or dislike each candidate.  And several candidates can be rated at the same level.  Note that this disqualifies us from applying the Gibbard-Satterthwaite theorem.  That theorem requires preference lists.  Well, fine, but the issue was not whether we could apply the theorem, the problem was the result of the theorem.  So must RV have a dictator, could a candidate be incapable of winning, or is the system susceptible tactical voting?  Nope on all three counts!  The bad things we had said that we wanted to avoid, are indeed avoided!

For these reasons, it is my belief that RV, or some variant of RV (such as dealing with the NO OPINION option in a different way), would make a very good voting system.  If nothing else, it is worth being discussed on the national level to determine if it is indeed as much better of a system as my first impressions think it is.

Well, that’s it for this post.  Turned out to be a longer post than I intended.  So thanks if you stuck with it all the way until here!  Stop by tomorrow to hear me tell you just how insignificant you are!

## To Converge or Not to Converge – Hilbert Spaces Part 4/Banach Spaces 1

Last time I introduced the notion of the weak topology on a Banach space $X$. I defined the topology in terms of net convergence, but remarked that this was equivalent to the weak topology generated by $X^*$. The purpose of this post will be to introduce a bunch of different topologies we can put on a Banach space, its dual, and even the set of linear operators on the Banach space. Before I do this however, I want to spend a bit of time reviewing the notion of the weak topology on a space generated by a collection of functions.

Definition: Let $Y$ be a topological space and $X$ be a set. If $\mathcal{F}=\{f_\alpha:X\to Y\}$ is a collection of functions, then the weak topology generated by $\mathcal{F}$ is the smallest topology on $X$ so that each $f_\alpha\in \mathcal{F}$ is continuous.

I’ll leave it as an exercise to show that this is equivalent to the topology generated by sets of the form $f_\alpha^{-1}(U)$, where $U\subseteq Y$ is open in the topology on $Y$. Now, suppose $X$ has the weak topology generated by $\mathcal{F}$. In analysis, it’s often useful to describe a topology in terms of convergent sequences, or in a more general setting in terms of convergent nets. I’ll now establish the following fact:

Proposition: Let $X,Y,\mathcal{F}$ be as in the above definition and endow $X$ with the weak topology generated by $\mathcal{F}$. Then a net $\{x_\beta\}$ in $X$ converges to $x\in X$ if and only if for every $f_\alpha\in\mathcal{F}$, the net $\{f_\alpha(x_\beta)\}$ converges to $f_\alpha(x)$ in $Y$.

Proof: If $\{x_\beta\}$ converges to $x$, then by the continuity of $f_\alpha$ (for all $\alpha$), we get the result. Conversely, suppose $\{x_\beta\}$ does not converge to $x\in X$. I’ll exhibit some $f_\alpha\in\mathcal{F}$ so that $f_\alpha(x_\beta)$ does not converge to $f(x)$. Since $x_\beta$ does not converge to $x$, there is some basic open set $U$ containing $x$ so that $x_\beta$ is not eventually in $U$. But now, by the comment above, $U=\bigcap_1^n f_i^{-1}(V)$, where $V\subseteq Y$ is open (technically we should take distinct sets $V_i$, but if we intersect these sets, the resulting smaller set works just fine). Now, since this is a finite intersection, the condition that $x_\beta$ is not eventually in $U$ implies that there must exist $i$ so that $x_\beta$ is not eventually in $f_i^{-1}(V)$. But this shows that $f_i(x_\beta)$ is not eventually in $V$, a neighborhood of $f_i(x)$, hence $f_i(x_\beta)$ does not converge to $f_i(x)$.

Okay, so that was a bit of technical, not-very-enlightening proof, but at least it’s there. Now for an example: the most familiar example of a weak topology is the product topology on a Cartesian product of sets, generated by the natural projection maps $\pi_\alpha:\prod_{j\in J} X_j\to X_\alpha$. Typically in point-set topology we are concerned with the first characterization of the product topology, namely it’s generated by finite intersections of cylinders $\pi_\alpha^{-1}(U)$, for $U$ open in $X_\alpha$. However, let’s see what happens if we consider this in terms of nets. For a finite product, then this says a net $(x_\alpha,y_\alpha)$ converges to $(x,y)$ if and only if $\pi_1(x_\alpha,y_\alpha)=x_\alpha\to x$ and similarly $y_\alpha\to y$.

But whatt happens if we view $X\times Y$ as the set of all functions from $\{0,1\}$ to $X\cup Y$ with $f(1)\in X$ and $f(2)\in Y$? Well, this says that $f_\alpha\to f$ if and only if $f_\alpha(1)\to f(1)$ and $f_\alpha(2)\to f(2)$, in other words it’s a type of pointwise convergence. Suppose $X$ is any topological space, recall that $C(X)$ denotes the continuous functions $f:X\to \mathbb{R}$ (or $\mathbb{C}$, either works). We can view $C(X)$ as a subspace of the set of all functions from $X$ to $\mathbb{R}$, i.e. $\mathbb{R}^X=\prod_{x\in X}\mathbb{R}$. If we give this space the product topology, then a net $f_\alpha$ in $C(X)$ converges to $f$ (which need not be in $C(X)$!) if and only if $\pi_x(f_\alpha)\to\pi_x(f)$ for each $x\in X$. But under the natural identification of functions with products of sets, this says that $f_\alpha\to f$ if and only if $f_\alpha(x)\to f(x)$ for all $x\in X$. Thus the product topology is often referred to as the topology of pointwise convergence. Clearly $C(X)$ need not be a closed subspace of $\mathbb{R}^X$, since, for example, it’s easy to construct continuous functions on $\mathbb{R}$ that converge pointwise to a discontinuous function.

Now let’s get back to Banach spaces. I’ve already defined the weak topology in a previous post, so now I’ll define the weak* topology on the dual of a Banach space. Of course, since $X^*$ is itself a Banach space, we can consider the weak topology on $X^*$, i.e. the topology generated by $X^{**}$. However, a more natural topology to consider is the so called weak* topology. Recall that we can naturally embed $X$ in $X^{**}$ by $x\mapsto \hat x$, where $\hat x(f)=f(x)$, i.e. $\hat x$ is the so-called evaluation functional. The weak* topology on $X^*$ is the topology generated by the set $\{\hat x:x\in X\}$. In terms of nets then, a net in $X^*$ converges to $f$ if and only if for all $x\in X$, we have $\hat x(f_\alpha)\to \hat x(f)$, i.e. if and only if $f_\alpha(x)\to f(x)$ for all $x\in X$. In other words, this is precisely the topology of pointwise convergence on $X^*$! In general, the weak* topology is weaker than the weak topology on $X^*$, though of course if $X$ is reflexive, then the two coincide. As I hinted at in my previous post, the weak* topology is important because in this topology, the unit ball of $X^*$ (i.e. the set of all functionals with operator norm less than or equal to 1) is a compact set. I’ll state and prove this below, but first I just want to mention topologies on operators.

There are a few natural topologies to put on $B(X)$, the collection of bounded (i.e. continuous) operators $T:X\to X$. As usual, we have the norm topology (using the operator norm), and the topology of pointwise convergence, which is often called the strong operator topology. We can also consider the weak operator topology, which is best defined in terms of nets. We say $A_\alpha$ converges to $A$ in the weak operator topology if and only if for all $x\in X$, we have that $A_\alpha x$ converges weakly to $Ax$, so the weak operator topology might best be described as the topology of weak pointwise convergence. Now, without further wait, I give you (with proof adapted from Folland)

Alaoglu’s Theorem: Let $X$ be a Banach space. Then the unit ball of $X^*$ is compact in the weak* topology.

Proof: Let $B$ denote the unit ball of $X^*$. Given $x\in X$, define $D_x=\{z\in\mathbb{C}:|z|\leq ||x||\}$, i.e. to each $x\in X$ we associate a scaled copy of the closed complex unit disk. Let $D=\prod_{x\in X} D_x$ and note that $D$ is compact by Tychonoff’s theorem. Now, $D$ is the collection of all complex valued functions $\psi$ on $X$ so that $|\psi(x)|\leq||x||$ for all $x\in X$, and so $B$ is simply the subset of $D$ whose elements are linear. Moreover, as noted above, the product topology on $D$ is the topology of pointwise convergence, which implies that the topology $B$ inherits as a subspace of $D$ is also the topology of pointwise convergence, and as I remarked this is the same as the weak* topology. Thus it suffices to prove that $B$ is a compact subset of $D$, and since $D$ is compact, it suffices to prove that $B$ is closed. If $f_\alpha$ is a net in $B$ converging to $f\in D$, all the remains to show is that $f$ is linear. But for any $x,y\in X$ and $a,b\in\mathbb{C}$, we have

$f(\alpha x+\beta y)=\lim f_\alpha(ax+by)=\lim\left(af_\alpha(x)+bf_\alpha(y)\right)=af(x)+bf(y)$

and the result is proven.

As you can see, once the proper identifications are made (and Tychonoff’s theorem is applied), this is actually a very clean proof. I’ll end my post, as usual, with an application of this theorem to prove a result that I think is pretty cool. The following is an exercise from Conway that shows, if we want, we can always view Banach spaces as continuous functions on some compact topological space.

Proposition: Let $X$ be a Banach space. Then there exists a compact space $K$ so that $X$ is isometrically isomorphic to a closed subspace of $C(K)$ equipped with the supremum norm.

Proof: Let $K=B$ be the unit ball of $X^*$ with the weak* topology, and identify $X$ with its natural image $\hat X$ in $X^{**}$, viewed as functions on $K$ simply by restriction of domain. I claim that $\hat X\subset C(K)$ (it’s not immediately clear that $\hat x$ remains a continuous map when $X^*$ is given the weak* topology instead of the norm topology). Fix $x\in X$. If $f_\alpha$ is a net in $K$ converging to $f$. In the weak* topology, this simply says $f_\alpha\to f$ pointwise, so for any $x\in X$, we have $f_\alpha(x)\to f(x)$. But this is precisely the statement that $\hat x(f_\alpha)\to\hat x(f)$, so indeed $\hat x\in C(K)$. Clearly the map $x\mapsto \hat x$ is injective, and we have

$||\hat x||_K=\sup_{f\in B}||\hat x(f)||=||\hat x||_{X^{**}}=||x||_{X}$

by definition of the operator norm on $X^{**}$ and using the fact that $x\mapsto \hat x$ preserves the norm on $X$. This shows that $f$ is an isometry (which implies that $T^{-1}$ is also bounded, hence $T$ is actually an isometric isomorphism onto its image). The last thing to prove is that $\hat X$ is actually a closed subspace of $C(K)$. Suppose $\hat x_\alpha$ is a net in $C(K)$ converging to $\Psi\in C(K)$. Then $x_\alpha$ is a Cauchy net in $X$, so it converges, say to $x$ (it’s a bit of an exercise, but one can show that in an first countable topological vector space, i.e. a vector space with a first countable topology so that addition and scalar multiplication are continuous, which certainly applies to normed spaces, that if every Cauchy sequence converges then every Cauchy net converges as well). But then for every $f\in K$, we have

$\Psi(f)=\lim \hat x_\alpha(f)=\lim f(x_\alpha)=f(\lim x_\alpha)=f(x)=\hat x(f)$

so indeed $\Psi\in\hat X$ and the proof is complete.

Actually, one can make the result a bit better. The Banach-Mazur theorem states that if $X$ is a real separable Banach space, then we can take $K$ in the above proposition to be the interval $[0,1]$ and use real valued functions instead of complex valued functions. That’s pretty cool! So, as you can see, the idea of weak topologies is useful and in some sense quite natural when considering normed spaces. Anyways, that’s about all I’ve got for this post, hopefully you enjoyed it.

## Math in Elections Part 4 — Democracy and The Electoral College?

So far in this series we have seen why the Gibbard-Satterthwaite theorem implies that the primaries are unfair. I had said that I was going to talk about alternative voting systems today. But I changed my mind. I’ll instead do that tomorrow. Today let’s move on to the real deal: The General Election.

Most US general elections are not nearly as bad as the primaries because essentially the the Gibbard-Satterthwaite theorem does not apply. One of the assumptions of Gibbard-Satterthwaite is that there be at least three candidates. In the primaries this is certainly true, but in most US general elections, at least 97% of votes cast are for either the Republican or the Democrat nominee. [Grammar break: “Democrat nominee”? “democratic nominee”? “Democrat’s nominee”? “republican” seems naturally to be an adjective while “Democrat” a noun! It makes it confusing!] Ok, back to math. Now, because the two major parties do not get 100% of the votes, the theorem still does technically apply. And the issues that we discussed last time can creep up.

At this moment there certainly are a number of congressional races which have three viable candidates, and hence the situation once again resembles the primaries. And even a presidential election can run into problems. Florida in 2000 is a classic example, and historically there have been many others. That 3% can certainly make a huge difference in a close race if the 3% are primarily being taken away from one of the two major candidates.

But let’s ignore these issues since at best they simply reduce the question to one resembling the primaries, which we have already discussed. Let’s assume that the presidential race is a 2-candidate race and focus our attention on the electoral college. As a preliminary statement, technically the members of each state’s electoral college are not bound to cast their votes for the winner of that state’s popular vote. But we will also ignore that since nowadays that rule is nothing more than than a formality and every time (except a possible rare example here or there, like Florida 2000, once again) they do decide with the people.

First of all, one advantage of the electoral college is that it maintains, to some extent, the notion that our nation is a collection of semi-autonomous states which blah blah blah.. Ok, fine, the founders liked states. But the question I want to discuss here is whether the electoral system is democratic. Americans are usually so proud to claim that we are a democratic nation. But in terms of electing our nation’s president, are we really?

Democracy is usually defined as a system of government where each person has equal say. Of course the founders excluded all sorts of people from this “each person” clause, but whatever. Eventually “land-owning, tax-paying, literate, christian, sufficiently old, non-criminal, white male American citizen” was reduced to “18 year old American citizen,” and Americans (probably at least as loudly as the land-owning, tax-paying, literate, christian, sufficiently old, non-criminal white male American citizens did back in 1800) proudly and loudly proclaim that we have a democracy.

So how do we measure whether “each person has equal say”? I propose the following reformulation: “each person’s vote has equal probability of deciding the election.” If everyone votes, but we dictate that the voters whose last names begin with F, O, or W get to their votes to count 100 times more than everyone else’s, then most everyone would agree that not every person has an equal say. Likewise if the voter’s vote was weighted based on how much land they own or their knowledge of biblical scripture.

But saying “each person’s vote has equal probability of deciding the election” is nice since, first, it is easy to analyze mathematically, but also since it does seem to be a very reasonable way to, in my view, precisely quantify the more philosophical “each person has equal say” statement. It essentially means that each person’s vote is not only counted, but is worth the same in the count.

So I am going to assume from here on that Americans do indeed want to democratically elect their president and that this desire mathematically means that each person wants their vote to have equal probability in deciding the election as any other person’s vote does.

With these assumptions, the electoral college is far from the right system. Consider a school with 35 teachers. For convenience we will refer to this school as Lincoln Southeast High School. Say the principal of Lincoln Southeast High School is holding a meeting in which the teachers must vote on some minor school issue. He plans to take a vote on it and democratically decide with the majority. Suppose 20 teachers support the proposal. Then we all agree that the issue should get passed.

But what if instead he first randomly puts the teachers into groups of 5 and has each group vote on the proposal.  He then chooses to accept the proposal if and only if at least 4 of the 7 groups collectively voted for it? Well, in this case it is possible that although 20 teachers support it, that the proposal will get rejected. This would happen if too many of the supporters of the proposal were clumped together. Then some groups of 5 may narrowly decide to reject the proposal by a 3 to 2 count, while others unanimously accept it. But the margins don’t matter: a 5-0 vote for the proposal in one group is worth just as much as a 3-2 vote against it in another.

But it could get even worse. What if the principal grouped people together by department. The science teachers were put into one group, math in another, special education in a third, etc.. Putting similar people together increases further the chance that some groups will vote in unison.

Similarly, grouping people together based on their geographic location (which state they live in) oftentimes groups like-minded people together and makes it possible for a candidate to win the popular vote but lose the election.

But moreover, it causes certain states to (essentially) be a guaranteed win for the Republicans or a guaranteed win for the Democrats. A (republican or democrat) voter in Texas, therefore, has no chance of his/her vote deciding the election, since his/her vote does not even have a chance of flipping that state. And without any chance of being able to flip the state, the vote is therefore mute. A voter in Ohio or Nevada, though, because of either the criticalness and swinginess of the state or its unique significance in the electoral math, has a much greater chance of his/her vote being the deciding vote in the election.

And that’s even without mentioning that the “number of congressman” rule for deciding how many electoral votes a state gets causes another imbalance in the value of a voter, dependent on which state he/she claims residency in.

Nate Silver, a statistician and writer of the wonderful fivethirtyeight blog, has estimated the relative likelihood of an individual voter casting a vote to decide Tuesday’s election. He concluded that an Ohioan, for instance, has at least a couple hundred times more sway than a Texas voter (and likely even a magnitude or two above that, but Silver is not too specific about how crappy a position the Texan is in. He only upper-bounds the crappiness).

So there you have it. Whether you think that Majority Rules is democratic, or whether some other, more general system is acceptable provided that “each person has equal say”, either way the electoral college ain’t the one you’re looking for.

Now, in closing I will comment that I do not want the electoral college to go away. The election would be less interesting if it were simply a popular vote, I think. Moreover, roughly a third of all Americans live in a “swing state”. So it’s not that there is not a fairly large representative sample of Americans within those states. A poll of 4,000 people is considered very accurate. With at least 40,000,000 voters in the swing states, it in some sense does amount to an awesomely good poll (although a somewhat geographically biased one..). Moreover, the country is too large for a presidential ticket duo to visit every nook and cranny of it. But when you restrict it down to 10 states, maybe those citizens are actually able to attend rallies and town halls with the candidates if they want, receive and digest the onslaught of information, and consequently make a more (but possibly less) educated vote. And the current swing states do tend to have a nice mixture of all types of America in them.

So I am not necessarily saying that the electoral college isn’t fun to have, interesting to see unfold, or pretty often determines the will of the people. But regardless, it is not democratic.

It might also be worth noting that asserting that I know a priori how Texas is going to vote, and concluding the value of a Texan’s vote as a result, may indeed be something that is worth discussing and not just carelessly assuming. Now, I personally think it’s fine to assert what I did, but at least one intelligent person I know has disagreed with me. Whateversuitsyourboat author Zach Norwood is indeed such a person. Perhaps in the next few days Zach will state his case in the comment section below.  But if not, feel free to put extra thought into the validity of that assertion yourself.

Tomorrow I will write about alternative voting systems. Then I will write about how much your vote matters. And finally, on election day, I will conclude this series with a post on the complexity of vote counting.  See you then.

## Math in Elections Part 3 — Delegates, Conventions and Kings.

Last time we talked about some problems with the primaries.  I wanted to mention one issue that can arise at the very end of the primary process because of a seemingly odd combination of convention rules.  The problem presents itself when the delegates convene to cast their votes to officially nominate their party’s candidate.

First consider the following very simplified election where there are three candidates: HotPants, ‘Zam and Z; and three voters:  Jaybert, GloryProne and Nick(i)Name.  Each voter ranked each of the candidates with a 1 for their favorite, 2 for eh, 3 for their worst enemy.  Here are the results:

So at this point, if everyone voted (assuming that we are using any semi-reasonable winner-declaring system), this race would of course end in a three-way tie.  But what if after some snooping around UCSD, Nick(i)Name learns how GloryProne and Jaybert are planning to vote.  Then, although she is unable to get ‘Zam, the true Man that she really wanted, she does have “King (Queen) Maker” powers.  i.e. she is able to choose the winner from among Z and HotPants.  Since Z is her worst enemy, she settles for HotPants (but what’s new, am I right? [jk, HotPants!]).

So the question is, when is there a King Maker?  And are King Makers necessarily a bad thing?

Assume there are three candidates in some heated election.  The election resulted in candidates $A$ and $B$ each getting 40% of the vote, while candidate $C$ received 20% of the vote.  Assume that in order to win, one of the candidates must get 50% + 1 of the vote.  Candidate $C$ has no chance to win in any scenario.  But what if the rules allow candidate $C$ to give his votes to any other candidate.  This may seem like an odd rule, but let’s go ahead with the assumption.

In such a scenario, candidate $C$ now has the ability to effectively choose the winner; $C$ is the King Maker.  In some sense this makes $C$ the “dictator” (as defined in Post 1) in a “new election”, in the following sense.  We can view the first election as resulting in no winner, and provided neither candidate $A$ nor $B$ is willing to concede, there is essentially a second election where there are two candidates ($A$ and $B$) and just one voter that matters ($C$).  We have already said that such a system is unfair in the first post of this series.

Now, there are some subtleties going on that make it a little different than the original dictatorial scenario.  If candidate $C$ simply votes for the candidate most similar to himself, then it would seem like the system more or less did choose a winner that is most representative of the people.  But no one expects this last step to be so pure.  You’d have to expect there to be some back room deal made and, because of that, this system seems rather suspicious.

Now, it may come as a surprise that what I just described is exactly how the presidential primaries work.  Each state holds a primary or a caucus.  And based on the results, some candidates are awarded delegates for the concluding party convention.  A candidate must get 50% + 1 of the delegates to be declared the party’s nominee.  However, here’s the catch: a candidate more or less has complete control of how his/her delegates vote.  Of course you would usually expect that candidate to instruct the delegates to vote for himself/herself.  But in a situation like the one described above, a candidate with no chance of winning among a field of candidates all of which failed to reach the 50% mark certainly could play King Maker if the number of their delegates is enough to push at least two of the other candidates past the 50% mark.

This was precisely Ron Paul’s goal this past year.  To collect enough delegates to be a player in the end, and hope that the field narrows down to Romney vs. A Single Conservative Alternative, in which case such a contested-convention scenario has a chance to arise.

The plan did not unfold as he hoped.  But for awhile it was a slightly reasonable possibility.  And so, indeed, this is a possible consequence of our poorly set up primary system that could, in some election cycle, cause many Americans to feel very mistreated by this non-democratic process.

## Math in Elections Part 2 — A Primary Example.

What’s wrong with our current system?  Well, a lot.

But before I get you all depressed, I will mention one thing that our system theoretically does have going for it.

In the last post we mentioned how the Gibbard-Satterthwaite theorem shows that a three+ party system opens up the possibility for significant issues to arise.  However, since America at the moment has just two parties which are major players on the national stage, most general elections are nearly a two-candidate contest.  And consequently at least these elections are likely to avoid the potential unfairness issues that I outlined last time.

Ok, now let’s get sad.  In this post we’ll talk about the Republican and Democrat party’s primaries.  Only voicing your opinion about one candidate has its drawbacks.  A primary is a classic sample case for the Gibbard-Satterthwaite theorem.  In a primary, similar candidates split the vote among the supporters of their ideology.  Others do not vote for their true preference because they view their candidate as being unable to win the primary and instead vote for another to avoid “wasting their vote”.

For instance, say there was a candidate that that was ranked highly on a minority of people’s preference lists, but extremely low on others.  For convenience, let’s call this person, oh I don’t know, let’s go with “Mitt Romney”.  The other candidates, being more similar, split the vote among themselves, allowing Romney to become the presumptive nominee before he ever won a majority of the votes in any state.

If one candidate was ranked 1st on 30% of people’s preference lists but in last place at 7th on the rest, but another candidate is 1st place on 25% of people’s preference lists and in 2nd place on all the other’s, then it might seem reasonable that this latter candidate is actually more representative of society.  But in a 1-candidate voting procedure, this is not considered.

So which of Gibbard-Satterthwaite’s voting conditions do the primaries break?  Based on the constant flow of campaign trail gaffes among the conservatives and Jon Huntsman’s $\text{\"{u}ber}$-low polling numbers, you might suspect that most of the candidates must be working hard to lose the winnability property.  Or maybe you thought that Sheldon Adelson would attempt to satisfy the dictatorial property.  And yet, at least in theory, neither of those were it.  It was the manipulability property that was actually being broken.

How does this apply to the primaries?  Well, say in some primary that Santorum beat Romney by a small margin (which happened in Iowa this year).  Then, everyone whose voter preferences were Cain then Romney then Santorum could have swung the election by instead voting for Romney then Cain then Santorum.  If so, then although not a single person changed their minds about whether they prefer Santorum to Romney or vice versa, still our voting procedure decided that society did change its mind about preferring Santorum to Romney.

By allowing voters to give the least amount of information possible (only their top choice), and by making it sensible to not vote for who you really prefer, we have indeed set up a very poor voting system.

So what would it look like if we allowed for a more complex voting system where each voter submitted their entire preference ranking?  Well, no matter how we did it, Gibbard and Satterthwaite showed that it would still be unfair in some way.  But it would still almost certainly be better than our current primary system.  Any reasonable one would reduce vote splitting, would improve the landscape for centrists, and likely, at least by some metric, help pick representatives whose views better match those of the electorate.  More on this next time.

In closing, I will give a couple more explicit examples of tactical voting altering or attempting to alter an election.  In this last election cycle Rick Santorum robocalled Michigan Democrats reminding them that they are able to vote in the primary, and many did.  The reasoning from the Democrats’ perspective is that Santorum would be much easier to beat in the general election than Romney.  So liberals showed up to the polls and cast votes for the more conservative candidate, the candidate that least resembled their viewpoints.  In an interview on NPR, one woman said it sent shivers down her spine to cast a vote for Santorum, who she disliked much more than Romney, but she did it because in the end it was what was best for Obama.

Now, although it exhibits the fault in the system, Romney did win in the end.  However the 2003 California Gubernatorial recall election is a very similar example where it is alleged that the result of the election was actually tipped due to tactical voting.  I’ll let you read about that up on your own, though.

Tomorrow I will discuss one more problem with the primary system.  In particular, one involving the delegates and the party conventions.  After that I will suggest some alternatives to our primary system.