Use of this document

This is a study note for using \(lmerTest\) package for Linear Mixed-Effects Models/Regression (LMER).

1. Introduction

The modeling framwork for random effect is Linear Mixed-Effects Models, or the following equivalent name in different disciplines:

(source: Xie (2010) Regression Analysis (Chinese), CH16)

1.1 Random effect for Nested data

Let’s quickly introduce random effect and fixed effect then explain why need to use LMM to deal with nest data.

  • Nested data: Level of one random effect only occur within one unique level of the higher. It often occurs in the design of experiments and in particular in the context of randomized experiments and randomized controlled trials. wiki

Without a random effect for each experimental unit, a one-to-one correspondence between observations and experimental units is assumed. Including random effects in a model is one way to account for a lack of independence among observations that might be expected based on the design of experiment. Here is the definitions of fixed effect and random effect:

  • Fixed effect: test for effect of parameters
  • Random effect: control for non-independence from nested or hierarchical structure. i.e. blocking, replicated sampling from individuals, a subset of many possible group that could be sampled.

1.2 Simpson’s paradox

In probability and statistics, Simpson's paradox refer to a trend appears in several different groups of data but disappears or reverses when these groups are combined. This result is often encountered in social-science and medical-science statistics and is particularly problematic when frequency data is unduly given causal interpretations. When not consider level for data where samples are not independent with levels, problems can be resulted in within-group regression, Between-group regression, and Total regression, shown in the folowing:

Simpson’s paradox

When imporperly consider consider level in a model, there two kind of statisical fallaies may occur:

  • Ecological Fallacy: a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the group to which those individuals belong. It could occur when when consider only cluster (lv.2) as the independent variable
  • Atomistic fallacy: the fallacy of drawing inferences regarding variability across units defined at a higher level based on data collected for units at a lower level. It could occur when consider only individual samples (lv.1) as the dependent variable

1.3 Linear Mixed-Effects Models

Linear Mixed-Effects Models assume that the data being analysed are drawn from a hierarchy of different populations whose differences relate to that hierarchy.

  • can improve parameter estimates: Because random effect are drawn from a commen distribution with a mean and standard deviation, infromation from all groups can be used to improve parameter estimates fror each individual group- a process called partial polling or shrinkage. (source: https://www.youtube.com/watch?v=QCqF-2E86r0 )

1.4 Minisum sample size by Degree of freedom

sample size requirement
Reference Minimum ratio of Group size/Sample size
Kreft (1996) 30/30
Hox (1998) 100/10

1.5 Intraclass Correlation Coefficient (ICC)

The ICC, or Intraclass Correlation Coefficient, can be very useful in many statistical situations, but especially so in Linear Mixed Models. The definition of ICC is the following:

\[ ICC = \rho = \frac{IntraclassVariance}{TotalVariance }\]

Why ICC is useful?

  • It can help you determine whether or not a linear mixed model is even necessary. If you find that the correlation is zero, that means the observations within clusters are no more similar than observations from different clusters. Go ahead and use a simpler analysis technique.
  • It can be theoretically meaningful to understand how much of the overall variation in the response is explained simply by clustering. For example, in a repeated measures psychological study you can tell to what extent mood is a trait (varies among people, but not within a person on different occasions) or state (varies little on average among people, but varies a lot across occasions).
  • It can also be meaningful to see how the ICC (as well as the between and within cluster variances) changes as variable are added to the model.

(Source: https://www.theanalysisfactor.com/the-intraclass-correlation-coefficient-in-mixed-models/)

Luke (2004)’s idea of whether using LMM:

  • Theoretic View: if the research is a hierachical topic;
  • Statistical View: if the analysis violate the independence assumption of residual (underfit due to missing variables);
  • Empirical View: if ICC is too high (variance explained by grouping to ignore the random effect of grouping variables).
Range of Intraclass coefficient to determine if apply LMM (Coben, 1988)
Intraclass correlation ICC if Applying LMM
low ICC < 0.059 No neccessary
moderate 0.059 < ICC < 0.138 Yes
high ICC > 0.139 Yes

ICC for unconditional and conditional models:

  • Usually, the ICC is calculated for the null model (“unconditional model”). Or it is called ICC(1)
  • However, according to Raudenbush and Bryk (2002) or Rabe-Hesketh and Skrondal (2012) it is also feasible to compute the ICC for full models with covariates (“conditional models”) and compare how much a level-2 variable explains the portion of variation in the grouping structure (random intercept). Or it is called ICC(2).

(source: https://www.rdocumentation.org/packages/sjstats/versions/0.17.4/topics/icc)

2. Fit model in R

In R, both \(lmer\) and \(lmerTest\) package can model Linear mixed-Effects models/regression. We will use \(lmerTest\) package because it can output significance of fixed effect.

m = lmerTest::lmer(data = , formula = DV ~ Fixed_Factor_1 + (Random_intercept + Random slope | Random_Factor_1), ...)

Variable:

Symbol:

(source: https://github.com/usplos/Eye-movement-related/blob/master/Linear%20mixed%20model%20(Hierarchical%20linear%20model)%20using%20R%20and%20JAMOVI.md; https://cran.r-project.org/web/packages/lmerTest/lmerTest.pdf)

2.1 R formula composition

R formula composition
Coefficient Full formula Simplified form
Fixed intercept + Fixed slope lm(Y ~ 1 + X1, …) lm(Y ~ X1, …)
Random intercept + Fixed slope lmer(Y ~ 1 + X1 + (1  + 1| group), …) lmer(Y ~ X1 + (1 | group), …)
Fixed intercept + Random slope lmer(Y ~ 1 + X1 +(0 + 1 | group), …) lmer(Y ~ X1 + (0 + 1 | group), …)
Random intercept + Random slope lmer(Y ~ 1 + X1 + (1 + X1 | group), …) lmer(Y ~ X1 + (X1 | group), …)

2.2 Parameter Extraction

# Checking linear coefficient for linear regression
broom::tidy(m1)
# Checking Level 1 coefficient for linear mixed model
broom.mixed::tidy(m, effects = "fixed")
# Checking Level 2 coefficient for linear mixed model
broom.mixed::tidy(m, effects = "ran_vals")

2.3 Solution when model fails to converge

  • Not enought degree of freedom: When number of observation is less than number of random effect, then model cannot converge. Solution 1: simplify the model (i.e. removing the problematic random slopes) Solution 2: increase the number of iteration
    m = lmerTest::lmer(data, formula, control = lmerControl(optCtrl = list(maxfun = 20000)))

  • Singular fit: the data cannot support the complex random effect structure? Solution: remove the most complex term in the random effect (i.e. random slope)

2.4 Model diagnositcs

  • residual vs. fitted for entire model: no trends, be normally distributed with equal variance;
  • residuals vs. each independent variable: no trends, be normally distributed with equal variance;
  • residual vs. fitted values with each random-effect group: be normall with equal variance within and between groups;
  • conditional mean of random effects: be normally distributed;

2.5 Other considerations

  • Random effect should have many levels (>5 said to be needed)
  • Varaibles that are strongly conllinear (strong correlation) can cause problems;
  • Structure of random effect should be specified correctly. This requires good understanding of your data and study design;
  • watch out for warning message about the model convergence

3. Types of LMM

when talk about LMM, Random intercept is often applied. So only Fixed slope and Random slope are considered.

Types of LMM.
Model category Meaning Formula Equivalent form
Fixed slope model (nullmodel, interceptonlymodel, unconditionalmodel) Random intercept with fixed mean lmer(Y ~ (1 | g), …) lmer(Y ~ 1+ (1 | g), …)
Fixed slope model Random intercept with a priori means lmer(Y ~ 0 + offset(o) + (1 | g), …) lmer(Y ~ -1 + offset(o) + (1 | g), …)
Fixed slope model (crossed random effect) Intercept varying among g1 and g2 lmer(Y ~ (1 | g1) + (1 | g2), …) -
Fixed slope model (3-level nested model) Intercept varying among g1 (lv.3) and g2 (lv.2) within g1 (lv.3) lmer(Y ~ (1 | g1 / g2), …) lmer(Y ~ (1 | g1) + (1 | g1: g2), …)
Random slope model Uncorrelated random intercept and slope lmer(Y ~ X + ( X || g), …) lmer(Y ~ X + (1 | g) + (0 + X | g), …)
Random intercepts and slopes model (recommended if possible) Correlated random intercept and slope lmer(Y ~ X + ( X | g), …) -

Reference:

3.1 Modeling

In the following, I am using the example demonstrated from this Link, specificall using experience to predict salary with/without department as level1 varaible.

library(tidyverse)
library(lme4)
library(modelr)
library(broom)
library(broom.mixed)
library(ggplot2)

create_data <- function() {
  df <- tibble(ids = 1:100,
               department = rep(c("sociology", "biology", "english", "informatics", "statistics"), 20),
               bases = rep(c(40000, 50000, 60000, 70000, 80000), 20) * runif(100, .9, 1.1),
               experience = floor(runif(100, 0, 10)),
               raises = rep(c(2000, 500, 500, 1700, 500), 20) * runif(100, .9, 1.1)) %>% 
    mutate(salary = bases + experience * raises)
}

df <- create_data()
df %>% head(10) %>% knitr::kable(format = "markdown") 
ids department bases experience raises salary
1 sociology 38059.49 6 2150.1902 50960.63
2 biology 47062.50 6 476.4946 49921.47
3 english 57341.60 5 466.0628 59671.91
4 informatics 71526.43 9 1759.8572 87365.15
5 statistics 75302.47 5 483.7902 77721.42
6 sociology 37071.41 2 2010.3158 41092.04
7 biology 47754.50 4 501.3143 49759.76
8 english 58306.45 7 511.0082 61883.51
9 informatics 72928.48 0 1710.6474 72928.48
10 statistics 85608.22 1 462.1660 86070.38
# Fixed intercept + Fixed slope: Model without respect to grouping
m1 <- lm(salary ~ experience, data = df)
df_pred_m1 <-df %>% modelr::add_predictions(m1)
# Random intercept + Fixed slope: Model with varying intercept
m2 <- lmer(salary ~ experience + (1 | department), data = df)
df_pred_m2 <-df %>% modelr::add_predictions(m2)
# Fixed intercept + Random slope: Model with varying slope
m3 <- lmer(salary ~ experience + (0 + experience | department), data = df)
df_pred_m3 <-df %>% modelr::add_predictions(m3)
# Random intercept + Random slope: Model with varying slope and intercept
m4 <- lmer(salary ~ experience + (1 + experience | department), data = df)
df_pred_m4 <-df %>% modelr::add_predictions(m4)

3.2 Regression Visualization

p1 <- df_pred_m1 %>% 
  ggplot(aes(x = experience, y = salary)) +
  geom_point() +
  geom_line(aes(x = experience, y = pred)) +
  labs(x = "Experience", y = "Predicted Salary") +
  ggtitle("Fixed intercept + Fixed slope") +
  scale_colour_discrete("Department")

p2 <- df_pred_m2 %>% 
  ggplot(aes(x = experience, y = salary, color = department)) +
  geom_point() +
  geom_line(aes(x = experience, y = pred)) +
  labs(x = "Experience", y = "Predicted Salary") +
  ggtitle("Random intercept + Fixed slope") +
  scale_colour_discrete("Department") + 
  theme(legend.position = "none")

p3 <- df_pred_m3 %>% 
  ggplot(aes(x = experience, y = salary, color = department)) +
  geom_point() +
  geom_line(aes(x = experience, y = pred)) +
  labs(x = "Experience", y = "Predicted Salary") +
  ggtitle("Fixed intercept + Random slope") +
  scale_colour_discrete("Department") + 
  theme(legend.position = "none")

p4 <- df_pred_m4 %>%
  ggplot(aes(x = experience, y = salary, color = department)) +
  geom_point() +
  geom_line(aes(x = experience, y = pred)) + 
  labs(x = "Experience", y = "Predicted Salary") + 
  ggtitle("Random intercept + Random slope") +
  scale_colour_discrete("Department") + 
  theme(legend.position = "none")

gridExtra::grid.arrange(p1, p2, p3, p4, nrow = 2, ncol = 2)

4. Effect analysis

Work flow of LMM analysis:

  1. Collect and prepare data, consider if nested data. (Theoretic View)
  2. Select the proper distribution for predictive variable for independence assumption of residual. It might lead to result in linear mixed-effect models for Gaussium distribution, or generalized linear mixed-effect model for other distribution (Statistical View)
  3. build null model, check ICC to determine suitability of LMM (Empirical View)
  4. include variables in lv.1 to add fixed slope. then compare the Lv1 effect size with previous model;
  5. include variables in lv.2 to add random slope. then compare the Lv2 effect size with previous model;
  6. evaluate assumption of the full model (Statistical View)

4.1 Random effect

  • use sjstats::icc() to calculate the intraclass Correlation Coefficient using the null model, which is used to determine if intraclass variable is large enought to apply LMM
# nullmodel = lmerTest::lmer(DV ~ 1+ (1|g), ...)
# sjstats::icc(nullmodel)
  • use summary() to view the the variance of random effect;
  • use lmerTest::ranova() to produce a ANOVA-like table for random effects to test significance of Lv.2 variables. It checks whether the model becomes significantly worse if a certain random effect is dropped (formally known as likelihood ratio tests), if this is not the case, the random effect is not significant.
# summary(fit1)
# lmerTest::ranova(fit1)

(source: https://www.rensvandeschoot.com/tutorials/lme4/)

4.2 Fixed effect: Main/Interaction effect

Similar to the GLM/OLS, use anova() to check if the main effects and interaction effect is significant in F-tests.

# anova(fit1)

3.3 Cross-level Interaction effects

In analysis with multilevel (i.e. at least two-level), two types of cross-level interactions could occur between the higher and lower level:

  • Compositional effects: inter-group (or inter-context) differences in an outcome (for example, disease rates) are attributable to differences in group composition (that is, in the characteristics of the individuals of which the groups are comprised)
  • Contextual effects: intra-group differences are attributable to the effects of group level variable or properties.

(source: https://jech.bmj.com/content/56/8/588)

4.4 Effect size

Effect size is the ratio of the variance of residual between improved model and the base model. For example, the effect size * 100% refers to the percentage of imporvement by adding variables into the new model.

\[ ES = \frac{variance_{base} - variance_{new}}{variance_{base}} \]

Range of effect size corresponding to levels of improvemnt as adding variables to the model (Cohen, 1988)
Effect size Improvement
0.2 ~ 0.15 weak
0.15 ~ 0.35 moderate
> 0.35 strong

5. Application of LMM in Randomized statistical experiments

In many research area, LMM is usually applied to the following common randomized statistical experiments The Depanding variable, Random effect, and Fixed effect will be listed out for type of experiments, respectively. During the model selection, it is important to consider whether:

5.1 Completely Randomized Design (CRD)

The main advantage of this design is randomization. The post-test comparison with randomized subjects controls for the main effects of history, maturation, and pre-testing; because no pre-test is used there can be no interaction effect of pre-test and X. Another advantage of this design is that it can be extended to include more than Controlled if necessary.

Completely Randomized Design

Completely Randomized Design

  • Depanding variable: Observational unit
  • Random effect: None
  • Fixed effect: treatment

5.2 Randomized Complete Block Design (RCBD)

The RCBD is the standard design for agricultural experiments where similar experimental units are grouped into blocks or replicates.

  • The number of blocks is the number of replications.
  • Treatments are assigned at random within blocks of adjacent subjects, each treatment once per block.
  • Any treatment can be adjacent to any other treatment, but not to the same treatment within the block
Randomized Complete Block Design

Randomized Complete Block Design

  • Depanding variable: Observational unit
  • Random effect: Blocking
  • Fixed effect: treatment

5.3 Generalized Randomized Block Design (GRBD)

Like a randomized complete block design (RCBD), a GRBD is randomized. Within each block, treatments are randomly assigned to experimental units: this randomization is also independent between blocks. In a (classic) RCBD, however, there is no replication of treatments within blocks.

  • Depanding variable: Observational unit
  • Random effect: Experimental unit (individual), Blocking
  • Fixed effect: Treatment