17.5.4.2 Algorithms (Mann-Whitney Test)

Consider two independent samples F(x)\, and G(y)\,, with the size of n_1\,\! and n_2\,\! , and the sample data is denoted as x_1,x_2,\ldots ,x_{n_1}\,\! and y_1,y_2,\ldots ,y_{n_1}\,\! respectively.

The null hypothesis, H_0: F(x) = G(y)\,, is that the two distributions are the same. And this is to be tested against an alternative hypothesis H_1\, which is:

H_1: F(x) \neq G(y)\,; or
H_1: F(x) < G(y)\,\!, the x\,'s tend to be greater than the y\,'s; or
H_1: F(x) > G(y)\,\!, the x\,'s tend to be less than the y\,'s.

The test procedure includes the following steps:

  • Combine  x_i \,\!,  y_i\,\! in a group.
  • Rank them in ascending order. Ties receive the average of their ranks. Let r_{1i}\,\!be the ranks assigned to x_i \,\!, for  i=1,2,\ldots ,n_1 and be the ranks assigned to  y_i\,\!, for  j=1,2,\ldots ,n_2.
  • Calculate sum of ranks:
     S_1=\sum_{I=1}^{n_1}r_{1i}\,\!, and  S_2=\sum_{I=1}^{n_2}r_{2j}\,\!
  • Test statistic U\, is defined as follow:
     U=S_1-\frac{n_1(n_1+1)}2\,
  • The approximate Normal test statistic z\,is calculated as:
    z=\frac{U-M(U)\pm \frac 12}{\sqrt{Var(U)}} \,
    where
    M(U)=\frac{n_1n_2}2 \,
    and
    Var(U)=\frac{n_1n_2(n_1+n_2+1)}{12}-\frac{n_1n_2}{(n_1+n_2)(n_1+n_2-1)}\times TS \,
    where
    TS=\sum_{j=1}^\tau \frac{(t_j)(t_j-1)(t_j+1)}{12}\,.
     \tau \,is the number of ties in the sample and  t_j\,is the number of ties in the jth group.
    Note that if no ties are present, the variance of U \, reduces to \frac{n_1n_2(n_1+n_2+1)}{12}\,

For more details of the algorithm, please refer to nag_mann_whitney (g08amc)