Use of this document

This is a study note for using \(stats\) package for ANalysis Of VAriance.

Prerequisites

library(tidyverse) # data manipulation
library(stats) # for type III anova
library(car) # for type I, II Sum of squred error anova
library(emmeans) # for post hoc mean seperation test
library(knitr)
library(mvnormtest)
data(mtcars)
library(caret)
data(Sacramento)

1. Introduction

Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts:

1.1 Use of ANOVA

ANOVA has wide applicability from experiments. By analyzing variance, ANOVA can make inferences about means. So, it is used for two different purposes:

  • To estimate and test hypothesis about population means
  • To estimate and test hypothesis about population variances.

1.2 Why use ANOVA and Not Use t-test Repeatedly?

Evaluation of mean comparison method.
Method Purpose Approach
t-test Test whether two groups have different mean values. Using t-statisics, \(t = \frac{Margin \ of \ error}{Estimated \ standard \ error \ of \ the \ mean}\)
Multiple t-tests Comparing each mean with each other mean without control on familywise error rates (FWER) repeating t-test on each pair, and result in severe inflation of the Type I error rate, so NOT RECOMMENTED
ANOVA uses data from all groups to estimate standard errors to test whether at least on group have different mean values from three or more groups, which can increase the power of the analysis. Using F-test statistic, \(F = \frac{explained \ variance}{unexplained \ variance}\) or \(F = \frac{between-group \ variability}{within-group \ variability}\)
Post-hoc analysis pairwise comparison while preventing severe inflation of the Type I error rate (false positives) 1) Adjustment on confidence level, \(\alpha\), 2) using different range distribution?

2. Method

2.1 Three Assumption

The analysis of variance has three assumptions:

  • Normality of the dependent variables distribution
  • Homogeneity of variance
  • Independent observations

2.1.1 Normality of the dependent variables distribution

# perform a Shapiro-Wilk test of normality with H0 = that a sample x_i came from a normally distributed population
# Test univariate Normality
shapiro.test(mtcars$mpg)
# Test Multivariate Normality
mvnormtest::mshapiro.test(t(mtcars[,3:5]))
# Significant departures from the line suggest violations of normality.
qqnorm(mtcars$mpg)
qqline(mtcars$mpg)

2.1.2 Homogeneity of variance

The variance in each cell should be similar. Check via Levene’s test or other homogeneity of variance tests which are generally produced as part of the ANOVA statistical output. Sample size: per cell > 20 is preferred; aids robustness to violation of the first two assumptions, and a larger sample size increases power

# Levene's test for Homogeneity of Variance with H0 = equal variance
# less sensitive than the Bartlett test to departures from normality
car::leveneTest(price ~ type, data = Sacramento)
# Bartlett Test of Homogeneity of Variances with H0 = equal variance
bartlett.test(y~G, data=mydata)
# Figner-Killeen Test of Homogeneity of Variances with H0 = equal variance
fligner.test(y~G, data=mydata)
# Homogeneity of Variance Plot
library(HH)
HH::hovPlot(price ~ type, data=Sacramento)

2.1.3 Independent observations

Scores on one variable or for one group should not be dependent on another variable or group (usually guaranteed by the design of the study)

2.2 Element in ANOVA

ANOVA is defined as a technique where the total variation present in the data is portioned into two or more components having specific source of variation. In the analysis, it is possible to attain the contribution of each of these sources of variation to the total variation. It is designed to test whether the means of more than two quantitative populations are equal. It consists of classifying and cross-classifying statistical results and helps in determining whether the given classifications are important in affecting the results.

(source: http://isoconsultantpune.com/analysis-of-variance-anova/)

Analysis of Variance – ANOVA

2.2.1 Null and Alternative Hypothesis

H0: the mean (average value of the dependent variable) is the same for all groups
H1: the average is not the same for all groups

### 2.2.2 Partitioning of the sum of squares *
  • sum of squares between groups, or SS(B) (Between Group Variation, Treatment, Factor, Model, Regresion): measures the interaction between the groups or samples. If the group means don’t differ greatly from each other and the grand mean, the SS(B) will be small.
    • A factor is an independent treatment variable whose settings (values) are controlled and varied by the experimenter. The intensity setting of a factor is the level.
    • In ANOVA for regression is perform, the regression or model is
  • sum of squares within groups, orSS(R) (Within Group Variation, Error, Residual): measures the interaction within the groups or samples. If the group means don’t differ greatly from each other and the grand mean, the SS(R) will be Large.

2.2.3 Degree of Freedom

Note that for k groups, sum of squares between groups will be k-1 degrees of freedom. The between groups variance is the variation, or SS(B), divided by its degree of freedom. We sometimes refer to the between groups variance as \(s_b^2\).

2.2.4 F-test

  • F-test: used to determine if a submodel is representative to the overall model with F-distribution. H0: not representative, or same, or no different.
  • F-ratio: the ratio of two scaled sums of squares reflecting different sources of variability. \(F = \frac{between \ group \ variation}{within \ group \ variation}\). This is equivalent to \(\frac{treatment effect + error}{error}\). If the treatment effect goes to zero, the F ratio will be \(\frac{error}{error}\) and go to 1. If the treatment effect increases toward infinity, the F ratio will also go towards infinity.

2.3 Type I, II, and III Sums of Squares

Consider a model that includes two factors A and B; there are therefore two main effects, and an interaction, AB. The full model is represented by SS(A, B, AB).

Perhaps most salient point for beginners is that SAS tends to use Type III by default whereas R will use Type I with the anova function. In R, Type II and Type III tests are accessed through Anova in the car package, as well as through some other functions for other types of analyses. However, for Type III tests to be correct, the way R codes factors has to be changed from its default with the options(contrasts =… function. Changing this will not affect Type I or Type II tests.

sum of squares Evaluating method in-used situation advantage R function
Type I sequentially balanced data, pure nested models, model with ordered varialbes (i.e. polynomial regressions) the simpler components (e.g. linear) to explain as much variation as possible stats::anova(model)
Type II partial balanced data, pure regression model, model with only main effect and no significant interaction allows lower-order terms explain as much variation as possible car::Anova(model, type="II")
Type III (recommended for general use) partial unbalenced data, non-nested model, model with no consideration of order of presence tests for the presence of each main effect after the other main effect and interaction, respectively car::Anova(model, type="III")
Type IV partial missing cells and missing nested designed primarily for situations where there are empty cells, also known as “radical” data loss -

(source: http://md.psych.bio.uni-goettingen.de/mv/unit/lm_cat/lm_cat_unbal_ss_explained.html)

2.3.1 Type I sum of squares

Type I sums of squares are determined by considering each source (factor) sequentially, in the order they are listed in the model. The Type I SS may not be particularly useful for analyses of unbalanced, multi-way structures but may be useful for balanced data and nested models. Type I SS are also useful for parsimonious polynomial models (i.e. regressions), allowing the simpler components (e.g. linear) to explain as much variation as possible before resorting to models of higher complexity (e.g. quadratic, cubic, etc.). Also, comparing Type I and other types of sums of squares provides some information regarding the magnitude of imbalance in the data. Types II and III SS are also know as partial sums of squares, in which each effect is adjusted for other effects.

  • Type I sum of squares (also called “sequential” sum of squares): The factors are evaluated for the remaining variantion in the order they are listed in the model. It could be applied to model satisfying one of the following condition:
    • The varialbes are intensionally ordered in the model for balance dataset
    • pure nested model
    • polynomial regression model with lower order term at front and higher order term at end, such as \(Y = \beta_0 + \beta_1 x + \beta_1 x^2 + \beta_1 x^3\)

2.3.2 Type II sum of squares

Type II partial SS are a little more difficult to understand. Generally, the Type II SS for an effect U, which may be a main effect or interaction, is adjusted for an effect V if and only if V does not contain U. Specifically, for a two-factor structure with interaction, the main effects A and B are not adjusted for the AB interaction because the interaction contains both A and B. Factor A is adjusted for B because the symbol B does not contain A. Similarly, B is adjusted for A. Finally, the AB interaction is adjusted for each of the two main effects because neither main effect contains both A and B. Put another way, the Type II SS are adjusted for all factors that do not contain the complete set of letters in the effect. In some ways, you could think of it as a sequential, partial SS; in that it allows lower-order terms explain as much variation as possible, adjusting for one another, before letting higher-order terms take a crack at it.

  • Type II sum of squares: not sequential, assumed no significant interaction. Computationally, this is equivalent to running a type I analysis with different orders of the factors, and taking the appropriate output. It could be applied to model satisfying one of the following condition:
    • any model with balanced data
    • any model with only main effect but assume no interaction effect.
    • any pure regression model
    • model with only one effect

2.3.3 Type III sum of squares

Type III is also a partial SS approach, but it’s a little easier to explain than Type II; so we’ll start here. In this model, every effect is adjusted for all other effects. The Type III SS will produce the same SS as a Type I SS for a data set in which the missing data are replaced by the leastsquares estimates of the values. The Type III SS correspond to Yates’ weighted squares of means analysis. One use of this SS is in situations that require a comparison of main effects even in the presence of interactions (something the Type II SS does not do and something, incidentally, that many statisticians say should not be done anyway!). In particular, the main effects A and B are adjusted for the interaction A*B, as long as all these terms are in the model. If the model contains only main effects, then you will find that the Type II and Type III analyses are the same.

  • Type III sum of squares: not sequential, tests for the presence of each main effect after the other main effect and interaction, respectively. similar to type II, but with consideration of the interaction. If the interactions are not significant, type II gives a more powerful test. Good for unbalenced data, because type III adjusts for the effect of unbalenced data. It could be applied to model satisfying one of the following condition:
    • model with unbalance data
    • non-nested model.
    • the effect is not relative to the order of presence in the model.

2.3.4 Type IV sum of squares

The Type IV functions were designed primarily for situations where there are empty cells, also known as “radical” data loss. The principles underlying the Type IV sums of squares are quite involved and can be discussed only in a framework using the general construction of estimable functions. It should be noted that the Type IV functions are not necessarily unique when there are empty cells but are identical to those provided by Type III when there are no empty cells.

  • Type IV sum of squares: Most of time similar to type III, but when there a missing cells and nested, there different between type IV and tpye III

(source:https://stats.stackexchange.com/questions/60362/choice-between-type-i-type-ii-or-type-iii-anova)

2.3.5 Comparing Type I, II, III ANOVA

# Example of different SS results for Type I, II, III 
options(contrasts = c("contr.sum", "contr.poly")) # needed for type III tests
# Data_rand_block
A = c("a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b")
B = c("x", "y", "x", "y", "x", "y", "x", "y", "x", "x", "x", "x")
C = c("l", "l", "m", "m", "l", "l", "m", "m", "l", "l", "l", "l")
Z = c( 14,  30,  15,  35,  50,  51,  30,  32,  51,  55,  53,  55)

model = lm(Z ~ A + B + C + A:B + A:C + B:C)
Type_I <- stats::anova(model)       # Type I tests
Type_II <- car::Anova(model, type="II")   # Type II tests
Type_III <-car::Anova(model, type="III")  # Type III tests

tibble(c('INTERCEPT', 'A', 'B', 'C', 'A:B', 'A:C', 'B:C', 'Residuals'),
       c(NA,Type_I$`Pr(>F)`),
       c(NA,Type_II$`Pr(>F)`),
       Type_III$`Pr(>F)`) %>% 
  setNames(c('Effect', 'Type I p-value', 'Type II p-value', 'Type III p-value'))%>% 
  knitr::kable()
# Effects and p-values from a hypothetical linear model.  While in this example the p-values are relatively similar, the B effect would not be significant with Type I sum of squares at the alpha = 0.05 level, while it would be with Type II or Type III tests. 

(source:https://rcompanion.org/rcompanion/d_04.html)

3. Classes of ANOVA Models *

Classification of ANOVA model
If number of factors = 1 If number of factors => 2
Parameters are fixed or non-random quantities One-way Fix-effect ANOVA Factorial Fix-effect ANOVA
Model parameters are random variables One-way Random-effect ANOVA Factorial Random-effect ANOVA
Summary of common ANOVA method
Type Independent variable Dependant varaible ?
One-way ANOVA - - -
Two-way ANOVA - - -
Analysis of variance in random block design - - -
Factorial design analysis of variance - - -
Crossover design analysis of variance - - -
Analysis of variance in split zone design - - -
Nested design analysis of variance - - -
Analysis of Variance of Repeated Measures Data - - -
Orthogonal design variance analysis - - -
Latin square design analysis of variance - - -
Star point design variance analysis - - -

3.1 Effect Models

  • Fixed effects models assume that each trial represents a random sample (colored curves) of a single population with a single response to treatment.
  • Random effects models assume that the different trials’ results (colored curves) may come from different populations with varying responses to treatment.

One-way fixed effects ANOVA(Model I) One-way random effects ANOVA (Model II)

3.1 Fixed-effects Models

3.1.1 Cell Mean Models

If the model to be fit is just the mean of each group, then is called cell means model:

\[ y_{ij} = \mu_i + \varepsilon_{ij} \]

which can be written: \[ \begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \\ y_7 \end{bmatrix} = \begin{bmatrix}1 & 0 & 0 \\1 &0 &0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix}\mu_1 \\ \mu_2 \\ \mu_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \\ \varepsilon_5 \\ \varepsilon_6 \\ \varepsilon_7 \end{bmatrix} \]

It should be emphasized that in this model \(\mu_i\) represents the mean of the \(i\)th group.

3.1.2 One-way Fixed-effects Models

The ANOVA model could be equivalently written as each group parameter \(\tau_i\) being an offset from some overall reference. Typically this reference point is taken to be one of the groups under consideration. This makes sense in the context of comparing multiple treatment groups to a control group and the control group is considered the “reference”. In this example, group 1 was chosen to be the reference group. As such the model to be fit is:

\[ y_{ij} = \mu + \tau_i + \varepsilon_{ij} \]

with the constraint that \(\tau_1\) is zero.: \[ \begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \\ y_7 \end{bmatrix} = \begin{bmatrix}1 &0 &0 \\1 &0 &0 \\ 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1\end{bmatrix} \begin{bmatrix}\mu \\ \tau_2 \\ \tau_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \\ \varepsilon_5 \\ \varepsilon_6 \\ \varepsilon_7 \end{bmatrix} \]

In this model \(\mu\) is the mean of the reference group and \(\tau_i\) is the difference from group \(i\) to the reference group. \(\tau_1\) is not included in the matrix because its difference from the reference group (itself) is necessarily zero.

model.fixed = gls(Length ~ Hand,
                  data=Data_rand_block, method="REML")

3.1.3 Factorial ANOVA (N-way ANOVA)

(source: https://rcompanion.org/handbook/G_09.html)

Two way factorial ANOVA is a special case of factorial ANOVA.

Factorial ANOVA involves testing of differences between group means based on two or more categorical independent variables, with a single, continuous dependent variable. In other words, a factorial ANOVA could involve:

  • Two or more between-subjects categorical/ordinal independend variables
  • One interval or ratio dependend variable

The results of interest are:

  • Main effect for treatment 1
  • Main effect for treatment 2
  • Interaction between treatment 1 and 2

Factorial designs can have more than 2 factors. N way refers to the number of factors in a factorial ANOVA design:

  • A two-way ANOVA would have a factor A (with 2 or more levels or groups) ‘crossed with’ a factor B (also with 2 or more levels or groups.
  • three way factorialwould have factors (A, B and C).
# The plot shows that mean weight gain for each diet was lower for the UK compared with USA.  And that this difference was relatively constant for each diet, as is evidenced by the lines on the plot being parallel.  This suggests that there is no large or significant interaction effect.  
model_full_factorial = lm(Weight_change ~ Country + Diet + Country:Diet,
           data = Data_full_factorial)
car::Anova(model_full_factorial, type = "II")
IM = phia::interactionMeans(model_full_factorial)
plot(IM)

Because the main effects were significant, we will want to perform post-hoc mean separation tests for each main effect factor variable. see [3. Afterword: Post-hoc comparisons]

3.2 Random-effects Model

Notice the grammar in the lme function that defines the model: the option random=~1|Individual is added to the model to indicate that Individual is the random term.

The methodology for working out sums of squares is identical to that used for fixed-effects ANOVA. Again we are not assuming equal sample sizes in each group.

The \(nlme\) package is used to produce an Analysis of Deviance with p-values for model effects.

3.2.1 One-way ANOVA with Random Blocks

(source: https://rcompanion.org/handbook/I_07.html)

library(nlme)
model.rand_block = lme(Length ~ Hand, 
                       random = ~1|Individual, 
                       data=Data_rand_block)
# type III SS nova
anova(model.rand_block) 

3.2.2 Compare the Random Block Model with fix effect model, random effect model, Null Model.

The \(nlme\) package has a function gls() that creates model objects without random effects in.

# Test the random effects in the model
model.fixed = gls(Length ~ Hand,
                  data=Data_rand_block, method="REML")
anova(model.rand_block, model.fixed)
rcompanion::nagelkerke(fit = model.rand_block, 
                       null = model.fixed) 

the null model with which to compare the model has to be specified explicitly. One approach is to define the null model as one with no fixed effects except for an intercept, indicated with a 1 on the right side of the ~. And to also include the random effects.

model.null_rand = gls(Length ~ 1,
                      data = Data_rand_block, method="REML")
anova(model.rand_block, model.null_rand)
rcompanion::nagelkerke(fit = model.rand_block, 
                       null = model.null_rand) 

to determining the p-value and pseudo R-squared for an lmer model is to compare the model to a null model with only an intercept and neither the fixed nor the random effects.

model.null_1 = gls(Length ~ 1, 
                   data = Data_rand_block, method="REML")
anova(model.rand_block,  model.null_1)
rcompanion::nagelkerke(fit = model.rand_block, 
                       null = model.null_1)

3.3 Blocking

In analysis of variance, blocking variables are often treated as random variables. This is because the blocking variable represents a random selection of levels of that variable.

The analyst wants to:

  • take the effects of the blocking variable into account using the random = ~1|blockVariable in lme() with \(nlme\) package.

but not want to:

  • the identity of the specific levels of the blocks are not of interest.
  • the interaction between other independent variables and the blocking variable.

3.4 Repeated Measures ANOVA

For ANOVA models involving repeated measures, there is also the assumptions of:

  • Sphericity: the difference scores between each within-subject variable have similar variances
  • Homogeneity of covariance matrices of the depending variables: tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups (see Box’s M)

3.5 ANOVA for Regression

Multiple linear regression attempts to fit a regression line for a response variable using more than one explanatory variable. The ANOVA calculations for multiple regression are nearly identical to the calculations for simple linear regression, except that the degrees of freedom are adjusted to reflect the number of explanatory variables included in the model.

3.5.1 Hypothesis

  • H0: the slope for each variables are equal
  • H1: at least one of the parameters are different from other

3.6 ANOVA for Comparing Regression

To compare the fits of two models, you can use the anova() function with the regression objects as two separate arguments. The anova() function will take the model objects as arguments, and return an ANOVA testing whether the more complex model is significantly better at capturing the data than the simpler model. If the resulting p-value is sufficiently low (usually less than 0.05), we conclude that the more complex model is significantly better than the simpler model, and thus favor the more complex model. If the p-value is not sufficiently low (usually greater than 0.05), we should favor the simpler model.

(source:https://bookdown.org/ndphillips/YaRrr/comparing-regression-models-with-anova.html)

3.6.1 Hypothesis

  • H0: the complex model explain same variation as the simple model
  • H1: the complex model explain more variation than the simple model