p Statistics: Sum of Digits
by JVSchmidt


General
The idea of this test is to divide data into substrings of length L, building the sum of digits of each substring and calculating the Chi2-value for the sum's distribution.
Here is an example for the p digits with L=5.
First sequence 14159 -> SUM = 20
Second sequence 26535 -> SUM = 21
Third sequence 89793 -> SUM = 36

For single digits (L=1) the sum is equal to the digits value and thus we have a probability of 1/10 for any sum of digits from 0 to 9. If w(L,S) is the probability that a chain of length L has a sum of digits equal to S, so
w(1,y)=1/10 with y=0,1,..,9
Any further distribution can be calculated recursively:
w(L,y) = w(L-1,y) + 1/10 * sum (for all i=0..y) w(L-1,y-i)

For any L the sum of digits is located between S=0 (all digits=0) and S=9*L (all digits=9). When going to longer and longer chains the min and max sums became extremely improbable because the likelihood for a single digit long run falls like 10-L.


Graph shows the distribution of the sum of digits for different lengths of chains:




Result's Overview
Digits analyzed: 4.2 * 10 9
Analysis started at digit: 1
Ellapsed computer time for one class: 3 min 30 sec - 4 min


Chi2-values for the distributions of sums of digits for different length of chains L

Length of
chains
L
Number of examined
chains
K = N/L
Number of different
sum values
D = 9*L+1
Chi2 Number of statistical
relevant subdivision
for the sum values
MIN / MAX
sum found
2 2.100.000.000 10 25,200 19 0 / 18
3 1.400.000.000 28 25,169 28 0 / 27
5 840.000.000 46 37,080 46 0 / 45
6 700.000.000 55 36,628 55 0 / 54
10 420.000.000 91 70,158 77 1 / 88
20 210.000.000 181 124,307 157 24 / 158
40 105.000.000 361 136,567 143 81 / 285
80 52.500.000 721 147,654 190 219 / 505

Pls. remeber that the MIN / MAX sum depends on the starting position which is alway 1 in our measurments.

Detailed results for this test you will find here: ChainSumDetails (EXCEL file)


<= Back to Main