17.5.5.2 Algorithms (TwoSample KolmogorovSmirnov Test)KSTestAlgorithm
The procedure below draws on NAG algorithms.
Consider two independent samples X and Y, with the size of and .Denoted as and respectively. Let F(x) and G(x) represent their respective, unknown distribution functions. Also let and denote the values of sample empirical distribution functions.
The null hypothesis :F(x)=G(x)
The alternative hypothesis :F(x)<>G(x) the associated pvalue is a twotailed probability;
or :F(x)>G(x) the associated pvalue is an uppertailed probability,
or : F(x)<G(x) the associated pvalue is a lowertailed probability
For the first case of , the statistics represents the largest absolute deviation of the two empirical distribution functions.
For the second case of , the statistics represents the largest positive deviation between the empirical distribution function of the first sample and the empirical distribution function of the second sample, that is .
For the third case of , the statistics represents the largest positive deviation between the empirical distribution function of the second sample and the empirical distribution function of the first sample, that is .
KStest2 also returns the standard statistics ,
where maybe ,, depending on the choice of the alternative hypothesis.
The distribution of the statistic converges asymptotically to a distribution given by Smirnov as and increase. The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed.
If and then an exact method is given by Kim and Jinrich. Otherwise is computed using the approximations suggested by Kim and Jenrich (1973)
Note that the method used only exact for continuous theoretical distributions.
This method computes the twosided probability. The onesided probabilities are estimated by having the twosided probability. This is a good estimate for small , that is , but it becomes very poor for larger .
For more details of the algorithm, please refer to nag_2_sample_ks_test (g08cdc) .
