p Statistics: Digit frequencies |
by JVSchmidt |
General | |
Doing a frequency analysis (FA) is nothing else but counting how many substrings of each possible pattern
appear in the complete digit sequence. For example: To analyze the substrings of length k=2 we read
the digit sequence in groups of digit pairs collecting every number into it's "home" box: 14 -15 - 92 - 65 - 35 - 89 - 79 -32 - 38 - ... In the end we will know how many "14", "15", "92" etc. were found. This is the base for calculating the Chi^{2}-value for the measured sequence to judge about p being random or not. If p is RANDOM, each number "XY" should have an equal probability to appear. | |
Result's Overview | |
Digits analyzed: 4.2 * 10 ^{9} Analysis started at digit: 1 Ellapsed computer time for one class: 5 min - 7min 30 sec |
Length of substrings L | Number of different substrings s (=10^{L}) | Expected Frequency per class | Chi^{2} | z-Value for approx. standard distribution |
1 | 10 | 420.000.000 | 6,59 | -0,5680 |
2 | 100 | 21.000.000 | 116,47 | 1,2415 |
3 | 1.000 | 1.400.000 | 1.043,42 | 0,9938 |
4 | 10.000 | 105.000 | 10.124,29 | 0,8860 |
5 | 100.000 | 8.400 | 100.049,85 | 0,1137 |
6 | 1.000.000 | 700 | 1.003.038,06 | 2,1489 |
7 | 10.000.000 | 60 | 10.002.938,10 | 0,6572 |
Why not testing longer sequences directly? |
There is a serious problem when testing the frequencies for longer and longer chains: We run out of data very fast. When testing chains with L=8 on a 4.2 billion database we will get a poor expected average of 5.25 that is near the lower limit of Chi^{2} usage. One can calculate this average value by use of A = N / (L x 10 ^{L}) where L is the length of proof sequences, N is the number of digits served to analyze. Even when using the data of last calculation record of Yasumasa Kanada from october, 2002, with about 1.24 x 10^{12} digits we can just proof sequences up to k=10. |
More details of digit frequency analysis | |
For single digit frequencies | |
For double digit frequencies
<= Back to Main |