A colleague asked me the following probability question:
Suppose I have N empty boxes and in each one I can place a 0 or a 1. I randomly choose n boxes in which to place a 1. If I start from the first box, what is the probability that the first 1 I find will be in the mth box?
This is depicted in the following graphic:
and we’re interested in the probability that the first 1 (reading from the left) is in, say, the third box.
I’m sure there must be a really nice form to the solution, but I can’t come up with one. The best I can do is the following slightly clunky one – any better ideas?
The probability of the first 1 being in the mth box is equal to the number of configurations where the first 1 is in the mth box, divided by the total number of configurations. The total number of configurations is given by:
and the number of configurations that have their first 1 in the mth box is equal to the subset of configurations that start 0,0,0,…,0,1. This is the same as the number of configurations of (n-1) 1s in (N-m) boxes, i.e. the different ways of building the sequence after the first 1:
So, the probability of the mth being the first 1 is:
Here’s what it looks like for a few different values of n, when N=100:
which make sense – the more 1s you have, the more likely you’ll get one early. So, can anyone produce a neater solution? I’m sure it’ll just involve transforming the problem slightly.
Footnote: because the first one has to occur somewhere between the 1st and (N-n)th position,
which seems surprising to me. Although I don’t know why.
Footnote2: Here’s another way of computing it, still a bit clumsy. The problem is the same as if we had N balls in a bag, n of which are red and (N-n) of which are black. If we start pulling balls out of the bag (and not replacing them), the probability that the first red one appears on the mth draw is equal to the probability of drawing m-1 black ones and then drawing one red one. The first probability can be computed from the hyper-geometric distribution as:
and the second probability is just equal to the probability of picking one of the n reds from the remaining N-(m-1) balls:
So the full probability is:
which is arguably messier than the previous one.