17.4.1.4 Algorithms (Two-Way ANOVA)


Theory of Two-Way ANOVA

Let y_{ij,k}\,\! denotes the kth observation at level I of factor A and level j of factor B, the two-way ANOVA model can be rewritten as

y_{ij,k}=\mu +\alpha _i+\beta _j+\gamma _{ij}+\varepsilon _{ij,k}

where \mu \,\! is the whole response data mean, \alpha _i\,\! is deviation at level I of factor A; \beta _j\,\! is the deviation at level j of factor B, \gamma _{ij}\,\!, is interaction term between two factors, and \varepsilon _{ij,k}\,\! is the error term. Then the sample variation was divided into three part, so we can make three hypotheses test:

For factor A, the null hypothesis is that the means of the r different populations are the same, and the alternate hypothesis is at least one population’s mean is different from the others:

H_{01}:\alpha _1=\alpha _2=\ldots =\alpha _r=0

H_{A1}:\alpha _p\neq \alpha _q, for some p and q, 1 ≤ p, qr;

For factor B, the null hypothesis is that the means of the s different populations are the same, and the alternate hypothesis is at least one population’s mean is different from the others:

H_{02}:\beta _1=\beta _2=\ldots =\beta _s=0;

H_{A2}:\beta _p\neq \beta _q, for some p and q, 1 ≤ p, qs;

For the interaction term, the null hypothesis is that there is no interaction between the two factors:

H_{03}:\gamma _1=\gamma _2=\ldots =\gamma _{rs}=0;

H_{A3}:\gamma _p\neq \gamma _q, for some p and q, 1 ≤ p, qrs;

To test these hypotheses, Then partition the variance of the whole sample into four parts and estimate by the sample variation:

SS_{Total}=SS_{Error}+SS_A+SS_B+SS_{AB}\,\!

where

SS_{Total}=\sum_{i=1}^r\sum_{j=1}^s\sum_{k=1}^t(y_{ij,k}-\bar y)^2

SS_{Error}=\sum_{i=1}^r\sum_{j=1}^s\sum_{k=1}^t(y_{ij,k}-\bar y_{ijm})^2

SS_A=st\sum_{i=1}^r(\bar y_{imm}-\bar y)^2

SS_B=rt\sum_{j=1}^s(\bar y_{mjm}-\bar y)^2

S_{AB}=t\sum_{i=1}^r\sum_{j=1}^s(\bar y_{ijm}-\bar y_{imm}-\bar y_{mjm}+\bar y)^2

and we have

\bar y=\frac 1{rst}\sum_{i=1}^r\sum_{j=1}^s\sum_{k=1}^ty_{ij,k}

\bar y_{ij}=\frac 1t\sum_{k=1}^ty_{ij,k}

\bar y_{imm}=\frac 1{st}\sum_{j=1}^s\sum_{k=1}^ty_{ij,k}

\bar y_{mjm}=\frac 1{rt}\sum_{i=1}^r\sum_{k=1}^ty_{ij,k}

SS_{Total} is the total sum of square, SS_A represents the variability of the average differences from factor A, SS_B represents the variability of the average differences from factor B, SS_{AB} represents the variability of interaction, and SS_{Error} represents the variability of all individual samples. Then, F test can be used to test the significance of variance between them, and we have:

F_A=\frac{MS_A}{MS_{Error}}=\frac{SS_A/(r-1)}{SS_{Error}/(rs(t-1))}\sim F_\alpha (r-1,rs(t-1))

F_B=\frac{MS_B}{MS_{Error}}=\frac{SS_B/(s-1)}{SS_{Error}/(rs(t-1))}\sim F_\alpha (s-1,rs(t-1))

F_{AB}=\frac{MS_{AB}}{MS_{Error}}=\frac{SS_{AB}/((r-1)(s-1))}{SS_{Error}/(rs(t-1))}\sim F_\alpha ((r-1)(s-1),rs(t-1))

Given a certain significance level \alpha , we can reject the null hypotheses if the F statistic exceeds the critical value F_\alpha , or equivalently, if the associated p-value of the F statistic is less than the significance level \alpha, H_0 will be rejected.

The calculation of two-way ANOVA table is summarized as below:

Source of Variation Degrees of Freedom (DF) Sum of Squares (SS) Mean Square (MS) F Value Prob > F
Factor A r - 1 SS_A MS_A MS_A / MS_{Error} P\{F\geq F_{(r-1,rs(t-1),\alpha )}\}
Factor B s - 1 SS_B MS_B MS_B / MS_{Error} P\{F\geq F_{(s-1,rs(t-1),\alpha )}\}
Interaction (r- 1) (s - 1) SS_{AB} MS_{AB} MS_{AB} / MS_{Error} P\{F\geq F_{((r-1)(s-1),rs(t-1),\alpha )}\}
Error rs (t - 1) SS_{Error} MS_{Error}
Total rst - 1 SS_{Total}

Origin’s two-way analysis of variance makes use of several NAG functions. The NAG function nag_dummy_vars (g04eac) is used to create the necessary design matrices and the NAG function nag_regsn_mult_linear (g02dac) is used to perform the linear regressions of the design matrices. The results of the linear regressions are then used to construct the two-way ANOVA table. See the NAG documentation for more detailed information.

Multiple Means Comparisons

Given that a two-way ANOVA experiment has determined that at least one factor level mean is statistically different than the other factor level means of that factor, a means comparison subsequently compares all possible pairs of factor level means of that factor to determine which mean (or means) is ( or are ) significantly different. There are various methods for multiple means comparison in Origin, and we use the NAG function nag_anova_confid_interval (g04dbc)to perform means comparisons.

two types of multiple means comparison methods:

Single-step method. It creates simultaneous confidence intervals to show how the means differ, including Tukey-Kramer, Bonferroni, Dunn-Sidak, Fisher’s LSD, Scheffé, and Dunnett mothods.

Stepwise method. Sequentially perform the hypothesis tests, including Holm-Bonferroni and Holm-Sidak tests

Power Analysis

The power analysis procedure calculates the actual power for the sample data, as well as the hypothetical power if additional sample sizes are specified.

The power of a two-way analysis of variance is a measurement of its sensitivity. Power is the probability that the ANOVA will detect differences in the population means when real differences exist. In terms of the null and alternative hypotheses, power is the probability that the test statistic F will be extreme enough to reject the null hypothesis when it should be rejected actually (i.e. given the null hypothesis is not true).

The Origin Two-Way ANOVA dialog can compute powers for the Factor A and Factor B sources. If the Interactions check box is selected, Origin also can compute power for the Interaction source A*B.

Power is defined by the equation:

power=1-probf(f,df,dfe,nc)\,\!

where f is the deviate from the non-central F-distribution with df and dfe degrees of freedom and nc = SS/MSE. SS is the sum of squares of the source A, B, or A*B, MSE is the mean square of the Errors, df is the degrees of freedom of the numerator for the source A, B, or A*B, dfe is the degrees of freedom of the Errors. All values (SS, MSE, df, and dfe) are obtained from the ANOVA table. The value of probf( ) is obtained using the NAG function nag_prob_non_central_f_dist (g01gdc) . See the NAG documentation for more detailed information.

All the above is a brief algorithm outline of one-way analysis of variation, for more information about the detail mathematical deduction, please reference to the corresponding part of the user's manual and NAG document.