I recently read about using physical dice to roll private Bitcoin keys, for example here. Intrigued, I ordered a pair of hexadecimal dice from GameStation.

Soon I began to wonder… How random are they? Do rolls result in an expected distribution? Can knowledge of the actual distribution characteristics of the dice be used to narrow private keys generated by this method into a practically searchable space, making such keys more susceptible to discovery?

I rolled my pair of hexadecimal dice 1,014 times for a total of 2,028 samples for the pair. Here are the results:

Already a severe underrepresentation of the values 3 and C is observed.

But how about a goodness of fit test? We begin with a null hypothesis that the dice are fair and distribute evenly. The data yield . Using a chi square table such as this one and looking at the row for 15 degrees of freedom, we see that the value 105 is off the chart, or p < 0.01. In fact, using this calculator, we see that p < 0.0001. This means the data are very much statistically significant to reject the null hypothesis — indicating that the dice are not observed to be fair over these 2,028 rolls.

Here’s a good article that examines the question of how many rolls give good results for the goodness of fit test.

However, commenter Matthew Neagly makes an interesting point in this post that “the larger your sample size, the more exactly you have to match a potential distribution to not reject a match. So eventually you hit a point where you’re ‘cursed’ to always reject your hypothesis.” In other words, the more rolls, the higher the probability that the test will reject a die as unfair. (He mentions two terms for this phenomenon, “curse of significance” and “doomed to significance,” but I was unable to find any discussion or related use of these terms.)

He also says, “This is one of the reasons why some statisticians favor visual interpretation of plots.” From that perspective, the histogram above is pretty clear.

The dice didn’t have fair distribution for my test… So what? It’s a fair (pun!) question.

At what point are dice “good enough”? Can these dice I’ve acquired, given the apparent wildly uneven distribution observed, be used to create private Bitcoin keys no more susceptible to discovery than keys generated by other means? Is the unevenness attributable to something specific or unique to these two dice or the rolling method/conditions? Or would a similar pattern be observed with other dice, that is, attributable to systemic or common causes in the manufacturing process? Finally, if they are truly fair and random dice, then certainly the exact sequence observed for this test is a valid possibility, however “unfair” it may seem.

Chuck,

Matt Neagley here: I wish I remember the text from which I got the term “Curse of Significance” or “Doomed to Significance” I also wish I could remember exactly which of those terms it was. Like you I’m not finding any hits online except us. I suspect the author of the text was trying to be clever and coined a catchy term that he couldn’t find an official name for.

I think the concept behind it is fairly intuitive though and you’ve expressed it well above. No die is fair, and the higher the power of our test, the smaller deviations we can detect. What’s important in the case of dice for casual purposes is it’s ability to feign an ideal distribution in short bursts. For gambling purposes, this is clearly not good enough, which is why casino dice are machined, not cast.

For your bitcoin questions I would offhand say (no math to back this up, it just “feels” correct so take it with a grain of salt.) that the uneven distribution would give a better chance to guess the generated numbers, given they were familiar with a significant number of samples because they could more easily filter out unlikely guesses for more likely ones. I don’t know enough about bit coins to know if one could get access to such a sample.

P.S. Sorry I rezed your dead thread. Just saw it tonight. 🙂

LikeLike