p Statistics: Digit frequencies
by JVSchmidt


General
Doing a frequency analysis (FA) is nothing else but counting how many substrings of each possible pattern appear in the complete digit sequence. For example: To analyze the substrings of length k=2 we read the digit sequence in groups of digit pairs collecting every number into it's "home" box:
14 -15 - 92 - 65 - 35 - 89 - 79 -32 - 38 - ...

In the end we will know how many "14", "15", "92" etc. were found.
This is the base for calculating the Chi2-value for the measured sequence to judge about p being random or not.
If p is RANDOM, each number "XY" should have an equal probability to appear.


Result's Overview
Digits analyzed: 4.2 * 10 9
Analysis started at digit: 1
Ellapsed computer time for one class: 5 min - 7min 30 sec


Chi2-values for the distributions of substrings with length L from 1 to 7

Length of
substrings
L
Number of different
substrings
s (=10L)
Expected
Frequency
per class
Chi2 z-Value for
approx. standard
distribution
1 10 420.000.000 6,59 -0,5680
2 100 21.000.000 116,47 1,2415
3 1.000 1.400.000 1.043,42 0,9938
4 10.000 105.000 10.124,29 0,8860
5 100.000 8.400 100.049,85 0,1137
6 1.000.000 700 1.003.038,06 2,1489
7 10.000.000 60 10.002.938,10 0,6572


Why not testing longer sequences directly?
There is a serious problem when testing the frequencies for longer and longer chains:
We run out of data very fast.
When testing chains with L=8 on a 4.2 billion database we will get a poor expected average of 5.25 that is near the lower limit of Chi2 usage. One can calculate this average value by use of
A = N / (L x 10 L) where L is the length of proof sequences, N is the number of digits served to analyze.
Even when using the data of last calculation record of Yasumasa Kanada from october, 2002, with about 1.24 x 1012 digits we can just proof sequences up to k=10.


More details of digit frequency analysis
For single digit frequencies
For double digit frequencies

<= Back to Main