In Sample Stalagmite Riddle #1, Dor asks "If you are searching for 512 different permutations, and you are guessing randomly, how many guesses should it take until you have found all of them?"
This is a very cool problem. After I spent a lot of time working on it, I found out that others have looked at it, too. Some guys at the University of Illinois developed lessons around a similar problem that they call the Cereal Box Problem. Below is some of the stuff I worked out before I looked at the Cereal Box problem.
Dor's theory is that "the number-of-attempts-until-success is the reciprocal of probability", where "success" in this case means guessing a permutation that hasn't been guessed before. If we do this 512 times and then add up all the guesses that were made, we will have the average number of guesses need to find all permutations.
Steve Gorodetsky proposed an elegant way to prove Dor's theory that the number of attempts until success is the reciprocal of the probability. I will present Steve's proposal and then complete the proof by mathematically finding sums of series.
As Dor suggested, Steve started by trying to solve the problem for 4-blocks instead of 9-blocks. This way, we only have 16 permutations to worry about instead of 512 (Why?).
Dor explained that on the first trial, any random permutation will be a permutation that we haven't picked yet, so we will always succeed in guess an unfound permutation on the first trial. In other words, it will always take one attempt to find the first permutation.
But what about the second permutation? Remember, we are making random guesses, so on our second guess, we might guess the permutation we already guessed (our one found permutation), or we might guess one of the unfound permutations.
Since there are 16 possible permutations, we have a 15/16 chance of picking an unfound permutation on our first guess (i.e., our first guess after finding the first permutation). But we might guess the found permutation on the first guess and an unfound permutation on the second guess. What would be the probability of that?
We need to multiply the probability of picking the found permutation times the probability of picking an unfound permutation. [need to add explanation of this].
We can make a table of probabilities of guessing the second permutation in a given number of guesses:
But what does this tell us about the number of guesses we should expect to make (after guessing the first permutation) in order to guess an unfound permutation (i.e., find a second permutation)?
We can apply the formula for expected value [need to add explanation of this]:
where X is the outcome "number-of-attempts-until-success". "Success" means guessing an unfound permutation after one of the permutations has already been guessed.
Steve plugged this sum into his calculator and found that it approaches 16/15.
Now, after two of the permutations have been found, there is a 2/16 chance that we will guess a permutation that has already been found, and a 14/16 chance that we will guess a permutation that hasn't been found. So the expected number-of-attempts-until-success after two permutations have already been found is:
Steve plugged this sum into his calculator and found that it approaches 16/14.
In this way, Steve found that the "number-of-attempts-until-success" did seem to be the reciprocal of the probability for the case of 16 permutations.
If we add together all of these expected values, we get the total number of guesses required to find all possible permutations. However, it is still required to prove that the expected value adds up to the "reciprocal of the probability" in the general case. In other words, we must prove that:
where m is the number of permutations, we have already found j-1 of the permutations, and X is the outcome "guessed an unfound permutation". From the equations related to E(X) for j = 1 and j = 2 above, we have:
So, we need to show that
Since we are summing over i, we can pull out the last product in the sum to give:
This sum looks kind of like the power series
where r = j/m and k = i-1, but not quite. It actually converts to this:
But if we remember the derivation of sum of a power series, this sum isn't too hard to obtain either.
Suppose that sn is the required sum. Then
Multiplying by r,
Subtracting, we get
But this is just the power series, which has the sum
Regrouping, we get
Substituting back in our original values of r=j/m, we get:
Finally, we see that
Now we have shown that equation (7) is true, so we know that equation (5) is true, which is the relation we wanted to prove.