British MPs do a lot of voting. In an average parliament (if there is such a thing), there can be upwards of 1500 separate votes, also known as divisions. The Public Whip is a brilliant site, dedicated to providing this data in an easy to use format.
The Public Whip also have a couple of nice little visualisations of the data, in particular the one here. This uses a mathematical techniques known as multi-dimensional scaling (MDS) to convert the specific voting pattern for each MP into two numbers that can then be plotted onto a map. In other words, it lets us visualise the 1500 or so votes in a nice 2-dimensional plot.
To do this, one has to convert the votes into numbers. Lets say, for arguments sake, that we use the number ‘1’ if the MP voted for a particular motion, ‘-1’ if they voted against it and ‘0’ for everything else (abstention, laziness, etc). MDS isn’t the only technique that can be used for visualisation like this. Chris Lightfoot used Principal Components Analysis (PCA) – a technique similar in mission to MDS but resulting in a different 2-dimensional plot (I’ve added an accessible-ish guide to PCA below – apologies to the purists).
I find this kind of exploratory analysis interesting and so got a project student to build some software that would allow the user to perform this analysis under different conditions (different parliaments, different sets of MPs, etc). The result was a really nice bit of software that will hopefully be made freely available somewhere at some point. My interest didn’t dwindle though and I had a bit of free time the other day so exported some data from the application for the 2005 parliament to play with. I used Matlab do a standard PCA on the 2005 parliament. Here is a plot of the MPs in their new world, coloured according to party (some of the colours aren’t very clear…sorry!):
The two dimensions chosen by PCA nicely split the three main parties into separate clouds. This isn’t surprising: MPs will often vote with their party and so we’d expect MPs within a party to vote similarly and MPs from different parties to vote differently.
However, there seems to a strange centralising force! Something is pulling politicians from all three parties into the centre of the plot, particularly obvious for the Conservatives (blue) whose MPs seem to form a line pointing towards the point (0,0).
It would be nice to think that this was the result of PCA capturing some interesting diversity within the main parties. However, as Chris Lightfoot discusses in his original analysis, this isn’t always the case: it’s impossible to disambiguate such real variation from artificial variation due to our representation of the data.
Recall that we gave each MP one of three values for each vote: -1,1 or 0. It is this last category that is the problem. An MP voting neither for nor against a particular motion could be doing so for a number of reasons, the most likely of which is that he or she wasn’t present on the day of the vote (play with this data for a while, and you’ll be struck by how rarely some MPs do vote). Encoding this as ‘0’ has the effect of pulling MPs who don’t vote very often towards the centre of the plot.
Ideally, rather then being forced to pick a number, it would be better if we could treat it as missing. However, PCA forces us to supply a number for each vote for all MPs. Using ‘0’ (or indeed anything) is a problem. Think of it this way: by using ‘0’, we are saying that the MP is sitting on the fence – halfway between for (1) and against (-1). In some cases this may be reasonable, but for the most part, it’s likely to be a gross over-simplification.
In statistical terms, these values are ‘missing’ and the problem of missing values is widely studied in statistics because they occur all over the place. Sometimes they occur randomly – imagine that the complete data-set exists somewhere and each value is kept or removed based on the toss of a coin. Other times there will be something systematic that causes values to be missing. Either way, there is a lot of literature devoted to this.
Sadly, there is no wonderful solution to the missing value problem using standard PCA. People use ad-hoc techniques like inserting average values or a constant value (that’s what I did with the ‘0’s). Chris Lightfoot tried a couple of things that were, by his own admission, hacks. They produced some interesting patterns but the problem with hacking around like this is that it’s impossible to be sure whether or not what we’re seeing means anything.
Fortunately, help is at hand. Probabilistic models are popular within the data analysis world and one reason is their ability to handle missing values. In fact, they not only overcome them, but can also simultaneously make an educated guess at what the values should have been – how the MP would have voted, had they been present. One such model is Probabilistic PCA published by researchers from Microsoft Research in Cambridge (download it here – might be paywalled). There is also what looks like an older technical from Sam Roweis (available here). The details are all in those papers – mathematical literacy required! Suffice to say that we can leave lots of MP-vote combinations blank and let the algorithm do its thing.
A bit of googling uncovered some code courtesy of Jakob Verbeek that does PPCA (sensibly, a la Roweis) in Matlab. Here are the results:
The difference is striking – each party is now represented by a much tighter cluster. A tentative conclusion might be that most of the diversity observed in the previous plot was due to attendance, rather than voting patterns. This is a bit of a shame: I’d like my MPs to be a bit more independent, but that’s just my opinion.
There is some diversity in the plot, and we can probably be confident that this is real. Look at the little tail of Labour MPs (red) going towards the Lib Dems. The fact that they’re joined by SDLP members is reassuring. I need to spend some time identifying some of these people to be certain that it makes sense but my guess is that it will. Other points of interest: in the first analysis, the DUP (black dots, hard to distinguish from blue ones) were intermingled with the Conservatives. It looks like this is just due to attendance: in the second plot, they form a nice little coherent group all by themselves – a kind of mini, more liberal Conservative party.
The only group that are now spread around are the independents (green dots) and there is no reason why they would vote together because, um, they’re independent.
At first glance, it looks like the PPCA is doing a great job – the grouping is incredibly clean. When I get some more time, I’d like to look into this further – PCA (and PPCA) find more than 2-dimensions. Looking at the third, fourth, fifth etc might provide some interesting patterns. It would be nice to also look at other parliaments (data from 1997, 2001 and 2010) is available. It would also be nice to look at the data for just one party. Inter-party differences will always be greater than intra-party differences (that’s why they are different parties!) and so swamps the analysis.
If nothing else, it’s a nice dataset for the data analysis community to play with to test their algorithms for projection and missing value imputation.
Appendix: Hand-wavy introduction to PCA
Imagine that there are three votes, rather than 1000. We want to combine these three numbers (remember that the votes are now numbers: 0,-1 or 1) somehow to give us 2 numbers for each MP that we can use for plotting – an x-coordinate and a y-coordinate if you like.
One way in which we could combine them is to add them up. If a particular MP voted -1,-1,0, their new number would be -2. More generally, we can assign a weight to each vote, multiply the vote by the weight and add up the results. Say our weights were 1,-2,5 our number would become (1 x -1) + (-2 x -1) + (5 x 0) = 1.
We could pick as many sets of weights as we like to give us as many new numbers as we like! This is exactly what PCA does – it picks 2 sets of weights such that for each MP, we can create 2 numbers from her voting pattern (actually PCA can get more than 2, but let’s not worry about that).
I picked those weights randomly. PCA picks them systematically such that when we plot the MPs in their new 2-dimensional world, they are as different as possible. This is good, because it tends to reveal any group structure in the data. If MPs vote in groups (parties), these groups will be apparent when we do the PCA.