17.1.4.3 Algorithms (CrossTabs)CrossTabs-Algorithm
CrossTabs is also called Contingency Tables. This tool is used to examine the existence or the strength of any association between variables.
CrossTabs Method
- Frequency Counts
- Marginal and Cell
- Chi-Square Tests Table
- Fisher's Exact Test Table (2 x 2 only)
- Measures of Association
- Measures of Agreement
- Odds Ratio and Relative Risk (2 x 2 only)
- Cochran-Mantel-Haenszel
Frequency Counts
Define
- are distinct values of row variable in ascending order, i.e.
- are distinct values of column variable in ascending order, i.e.
- is the frequency with respect to cell
- is subtotal of the th row
- is subtotal of the th column
- is the total number.
Marginal and Cell
Statistics
|
Formula and Explanation
|
Count
|
|
Expected Count
|
|
Row Percent
|
|
Column Percent
|
|
Total Percent
|
|
Residual
|
|
Std. Residual
|
|
Adj. Residual
|
|
Chi-Square Statistics
Statistics
|
Formula and Explanation
|
Degree of Freedom
|
Pearson Chi-Square
|
|
|
Likelihood Ratio
|
|
|
Linear Association
|
, where is the Pearson correlation coefficient.
|
|
Continuity Correction
|
, which is calculated only for 2 x 2 table
|
|
Fisher's Exact Test
This test is useful when some expected cell count is low (less than 5). It's calculated only for 2 x 2 table. Suppose we have the table in the following:
|
|
|
Subtotal/Total
|
|
|
|
|
|
|
|
|
Subtotal/Total
|
|
|
|
Under the null hypothesis (Independence), the count of the first cell is a hypergeometric distribution with probability given by
, .
one-Sided test
The one-sided test significance level is calculated by
- p(left-sided test) =
- p(right-sided test) =
Two-Sided tail
The two-tail significance is
where
- , if
- , if
Measures of Association
Define
-
-
-
-
-
-
- is subtotal of the th row
- is subtotal of the th column
- is the total number.
Statistics
|
Formula and Explanation
|
Standard Error
|
Phi Coefficient
|
, which is calculated for not 2 x 2 table. For a 2 x 2 table, it is equal to
The value ranges from , where ,
|
|
Cramer's V
|
|
|
Contingency Coefficient
|
|
|
Gamma
|
|
|
Kendall
|
Tau-b
|
|
|
Tau-c
|
, where
|
|
Somer's D
|
CR
|
|
|
RC
|
|
|
Symmetric
|
|
|
Lambda
|
CR
|
, where is the largest count in ith row, and is the largest column subtotal.
|
,
where is the column index of , is the index of column subtotal for .
|
RC
|
,
where is the largest count in jth column, and is the largest row subtotal.
|
,
where is the row index of , is the index of row subtotal for .
|
Symmetric
|
|
where , , , and .
|
Uncertainty
|
CR
|
, where , and , and
|
, where
|
RC
|
|
|
Symmetric
|
|
|
Measures of Agreement
This table is calculated only when two conditions are satisfied (1) square table, i.e. , and (2) the row variable and column variable have same values.
The Kappa statistic is calculated by
The standard error is estimated by:
- .
where , ,
and .
The corresponding asymptotic standard error under the null hypothesis is given by
Another related statistic is Bowker, which is used to test for all pairs. If , the statistic is calculated as
For lager samples, is asymptotically chi-square distribution with degree of freedom .
Note that for 2 x 2 table, Bowker's test is equal to McNemar's test. So we only give Bowker's test.
Odds Ratio and Relative Risk
These statistics are calculated only for 2 x 2 table.
Odds Ratio
The Odds Ratio is calculated as
Relative Risk
The Relative Risks are given by
-
-
-
-
Cochran-Mantel-Haenszel
Define
- be the number of layers
- be the frequency in the ith row, jth column and kth layer
- be the jth column, kth layer subtotal
- be the ith row, kth layer subtotal
- be the kth layer subtotal
- be the expected frequency of the ith row jth column kth layer cell
-
Mantel-Haenszel statistic
The Mantel-Haenszel statistic is given by
where sgn is the sign function .
Breslow-Day statistic
The Breslow-Day statistic is
where .
Tarone’s Statistic
The Tarone’s Statistic is
where .
Common Odds Ratio
For a 2×2×K table, the odds ratio at the kth layer is .
Assuming that the true common odds ratio exists,taht is , Mantel-Haenszel's estimator of the common odds ratio is
The asymptotic variance for is:
The lower confidence limit(LCL) and upper confidence limit(UCL) for is:
- and
|