Now with 149295449 bits of high-quality random data!
What does that mean?
Before I say anything else, a disclaimer. I am not an expert on random numbers and cryptography. I am a computer science
student, and what I relate below is what I have learned in my studies and my own research in order to make this server.
Dice are basically just random number generators. It's easy to make random numbers with real dice, and actually fairly
difficult to do with computers. When a computer generates a random number, usually what it produces is a
pseudo-random
number - one that seems to be random, but really isn't. This is most often done using some kind of mathemetical
formula operating on a
seed. Here's a simple example of the process, supposing that we want to roll 1d20.
- Choose a seed. A common way to choose a seed is just to use the current time. The current time is
16:13:46 GMT and 987.489 ms. We need a number that changes quickly, so let's use the microseconds.
So now we have our seed number: 987.489.
- Use a mathematical operation to make it less predictable. Let's try a square root: 31.4243377019. Now
that looks a bit more "random".
- But we need an integer from 1 to 20! In that case, let's just use the decimal part of that number.
It will always be greater than or equal to 0, and less than 1. So if we multiply it by 20, we'll get a number greater
than or equal to 0, and less than 20. Then we can round down, to get a number from 0 to 19, and add one, to get a
number from 1 to 20.
- Let's try it. The decimal part of our number is: 0.4243377019. Multiplied by 20, and rounded
down, we get 8. Add 1, and our final value is: 9
Now if you refresh the page a few times you'll see some different numbers come up, and they'll look kind of random. But
there's a catch: hypothetically, if you were to very carefully choose the time when you load the page, you could control
which "random" number comes up. In other words, once you know the method used for creating them, these numbers are quite
predictable. That makes them low-quality random numbers, because they're really not very random. But computers use (more
complicated) variations of this method to make their pseudo-random numbers.
How to make high-quality random numbers
Probably the best way is to measure the decay of a radioactive substance. No, wait. Probably the best way is roll REAL
dice. But it's hard when, like a computer, you don't have hands. Measuring radioactive decay is something that a computer
can do pretty well, but it's not very practical for average computers like mine.
So here's what I do instead. Whenever you move the mouse on the rolling page, some information about your mouse movements
is recorded. You're not a computer, so when you move the mouse, it's not directly from place to place - there are many
small random variations in the motion. When you click a button, that information is sent to my server, which uses some
mathematical mumbo-jumbo to distill the randomness out of your actions. Then it's saved for future use, the next time
somebody rolls some dice.
How good is your method?
Pretty good! All of the data is quality-checked before going into active use, using methods based on those of
ENT, a program that measures the quality of (putative) random data. There
are 5 measures used to validate the data:
- Entropy: The information density of the data. In practical terms, this is a measure of how compressible the
data is. Perfectly compressed or perfectly random data have perfect information density.
- Chi-square distribution: How much the numerical distribution of the data differs from expectations. E.g.,
if we pick 10 numbers from 1-10, we'd expect (on average) each number to come up once.
- Mean value: You know what it means, don't you?
- Monte Carlo value for Pi: The data are treated as points on a plane. The proportion that fall inside a
circle on that plane can be used to calculate a value for Pi, if they're properly random.
- Serial correlation: How much successive data points are correlated with one another. In truly random data
there should be no correlation.
To see how the current data holds up under all of these measures, as well as some derived/source measures, take a look
at the
audit results.