p Statistics: Digit frequencies by JVSchmidt

 General Doing a frequency analysis (FA) is nothing else but counting how many substrings of each possible pattern appear in the complete digit sequence. For example: To analyze the substrings of length k=2 we read the digit sequence in groups of digit pairs collecting every number into it's "home" box: 14 -15 - 92 - 65 - 35 - 89 - 79 -32 - 38 - ... In the end we will know how many "14", "15", "92" etc. were found. This is the base for calculating the Chi2-value for the measured sequence to judge about p being random or not. If p is RANDOM, each number "XY" should have an equal probability to appear. Result's Overview Digits analyzed: 4.2 * 10 9 Analysis started at digit: 1 Ellapsed computer time for one class: 5 min - 7min 30 sec

Chi2-values for the distributions of substrings with length L from 1 to 7

 Length of substrings L Number of different substrings s (=10L) Expected Frequencyper class Chi2 z-Value for approx. standard distribution 1 10 420.000.000 6,59 -0,5680 2 100 21.000.000 116,47 1,2415 3 1.000 1.400.000 1.043,42 0,9938 4 10.000 105.000 10.124,29 0,8860 5 100.000 8.400 100.049,85 0,1137 6 1.000.000 700 1.003.038,06 2,1489 7 10.000.000 60 10.002.938,10 0,6572

 Why not testing longer sequences directly? There is a serious problem when testing the frequencies for longer and longer chains: We run out of data very fast. When testing chains with L=8 on a 4.2 billion database we will get a poor expected average of 5.25 that is near the lower limit of Chi2 usage. One can calculate this average value by use of A = N / (L x 10 L) where L is the length of proof sequences, N is the number of digits served to analyze. Even when using the data of last calculation record of Yasumasa Kanada from october, 2002, with about 1.24 x 1012 digits we can just proof sequences up to k=10.

 More details of digit frequency analysis For single digit frequencies For double digit frequencies <= Back to Main