In the Machine Learning course I teach (designed with Mark Girolami), I use Olympic 100m winning times to introduce the idea of making predictions from data (via linear regression). Here’s the data, with the linear regression line projected into the future. Black is men, red women (dashed lines represent the plus and minus three standard deviation region):
There are two reasons for using this data. The first is that it’s nice and visual and something students can relate to. The second is that it opens up a discussion about some of the bad things you can do when building predictive models. In particular, what happens if we move further into the future. Here’s the same plot up to 2200:
Due to the fact that women have historically been quicker at getting faster, the two lines cross at the 2156 Olympics. At about this point in the lectures a vocal student will pipe up and point out that assuming the trend continues this far into the future looks a bit iffy. And if we assume a linear trend indefinitely, we eventually get to the day where the 100m is won in a negative number of seconds:
Hopefully you’ll agree that we should be fairly sceptical of any analysis that says this could happen. So I was quite surprised to discover when reading David Epstein’s The Sports Gene that academics had done exactly this. And published it in Nature. NATURE! Arguably the top scientific journal IN THE WORLD. If you don’t believe me, here it is:
Momentous sprint at the 2156 Olympics.


Visualising parliamentary data…again

In a previous post, I described a very general binary PCA algorithm for visualising voting data from the House of Commons.

To recap, we start with voting data which tells us how each of the 600 or so MPs voted in each of the 1400 or so votes of each parliament. An MP can vote for a particular motion or against it, or may not vote at all. The algorithm converts this into a two-dimensional plot where each MP is represented by a point. MPs close together have similar voting patterns, and those far apart don’t. See this for a better description.

I’ve now extended the algorithm a bit – not time to go into details now – but the outcome is that it produces a visualisation that incorporates a degree of uncertainty in the location of the MPs. This uncertainty can come from one of two sources:
1. Lack of data: if MPs don’t vote very much, we can’t be sure about where they should be plotted (a nice example are the three deputy speakers in 2010plain – big ellipses near the centre).
2. Lack of conformity: it might be hard to place some MPs in two dimensions in such a way that most of their votes are modelled correctly.

The following plots show the results for the 1997, 2001, 2005 and 2010 (up to about May 2010) parliaments. Each ellipse is an MP, and they’ve been coloured according to their party (main parties are obvious, key at the bottom for the others). The ellipses roughly represent where the MP might lie — the bigger the ellipse, the less sure we are about the location.

The lines on the plots represent the votes. MPs on one side of the line voted for the motion and on the other against. It would be nice to label some of the votes, maybe I’ll do this soon.

Anyway, here are the files. They are a bit messy, and I’ve labeled some of the MPs who I thought might be interesting. Happy to label any others if anyone is interested.


Some things that I think are interesting:
1. Clare Short before and after (Clare Short2) resigning (2005plain).
2. Ken Livingstone before and after (Ken Livingstone 2) resigning (1997plain)
3. How much the nationalist parties who were previously close to the Conservatives and Lib Dems (e.g. DUP and SNP) have now deserted them (Compare everything with 2010plain)
4. How close Clegg and Cameron are (2010plain) (couldn’t find a font small enough to separate them)
5. There appear to be more rebellious Conservatives in the coalition (2010plain) than Lib Dems (more lines (votes) dissect the blue ellipses than the yellow ones).
6. In 1997, the Lib Dems seemed to vote almost equally with Labour and with the Conservatives (roughly the same number of lines splitting Lib&Lab from Con as there are splitting Lib&Con from Lab. In 2001 and 2005, the Lib Dems seem more aligned with the Conservatives than Labour (more lines splitting Lab from Lib&Con than splitting Lib&Lab from Con).

Red: Labour
Blue: Conservative
Yellow: Lib Dem
Magenta: DUP
Orange: Plaid Cymru
Weird pinky-orange colour: Scottish National Party (normally found next to Plaid Cymru)
Green: All the rest