In the Machine Learning course I teach (designed with Mark Girolami), I use Olympic 100m winning times to introduce the idea of making predictions from data (via linear regression). Here’s the data, with the linear regression line projected into the future. Black is men, red women (dashed lines represent the plus and minus three standard deviation region):
There are two reasons for using this data. The first is that it’s nice and visual and something students can relate to. The second is that it opens up a discussion about some of the bad things you can do when building predictive models. In particular, what happens if we move further into the future. Here’s the same plot up to 2200:
Due to the fact that women have historically been quicker at getting faster, the two lines cross at the 2156 Olympics. At about this point in the lectures a vocal student will pipe up and point out that assuming the trend continues this far into the future looks a bit iffy. And if we assume a linear trend indefinitely, we eventually get to the day where the 100m is won in a negative number of seconds:
Hopefully you’ll agree that we should be fairly sceptical of any analysis that says this could happen. So I was quite surprised to discover when reading David Epstein’s The Sports Gene that academics had done exactly this. And published it in Nature. NATURE! Arguably the top scientific journal IN THE WORLD. If you don’t believe me, here it is:
Momentous sprint at the 2156 Olympics.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s