3  Linear mixed model

NoteRecap

Marginal model:

  • Focus attention on the model for the mean, treat correlation structure as a “nuisance”

  • Validity of inferences about regression parameters protected by using “robust” SEs (for large # clusters)

  • Useful mainly when data are balanced (number of measurements similar across clusters)

Linear Mixed Models: models the entire joint distribution of the observations, including covariance structure

  • “Full probability model” : Likelihood-based estimation

  • Estimators fully efficient when model is correct (i.e. ‘smallest possible SEs’)

  • Useful when variance components or correlation structure is of interest

3.1 Data

Longitudinal data: Potthoff and Roy (1964):

  • Measured distance from centre of the pituitary gland to the pterygomaxillary fissure by x-ray (important in orthodontics)

  • Measured for boys and girls with aim to understand changes with age and by sex

  • Measured at ages 8, 10, 12, 14

  • Rescaled to show time since age 8

library(mice)
library(ggplot2)
library(dplyr)
library(tidyr)
library(stringr)

data <- potthoffroy |> 
  pivot_longer(cols = -c(id,sex)) |> 
  mutate(age = str_extract(name,'\\d+') |> as.numeric(),
         sex = factor(sex,labels = c("Female","Male")),
         time = age-8,
         y = value)

data |> 
  ggplot(aes(x = time, y = y))+
  geom_line(aes(group = id))+
  geom_point()+
  facet_wrap(~sex)+
  labs(x = "Years since age 8", y = "Distance")+
  theme_bw()

3.2 Random intercepts model

In this example, we only consider the male group, and focus on within cluster comparison (change over time). In linear mixed model, we extend to compare the variation between individuals or subjects by adding one level to the simple linear regression \(Y_j = \beta_0 + \beta_1 t_j\).

The “two level” or “hierarchical” model:

  • Level 1: Describes variation within individuals

    • Straight line (linear regression) for each subject

    • \(\beta_{0i}\) is subject-specific intercept

    • \(\beta_{1}\) is the common slope

  • Level 2: Describes variation in intercepts between individuals

    • \(\beta_{0i} = \beta_0 + b_{0i}\) with \(b_{0i} \sim N(0,\sigma^2_{b0})\)

    • \(\beta_{0}\) is average intercept

The combined model: \(Y_{ij} = (\beta_0 + b_{0i}) + \beta_1 t_{ij} + e_{ij}, \ e_{ij} \sim N(0,\sigma^2_e)\)

Then the marginal model for Y will be:

  • \([Y_{ij}] = \beta_0 + \beta_1 t_{ij}\)

  • \(Var(Y_{ij}) = \sigma^2_{b0} + \sigma^2_e\)

  • \(Corr(Y_{ij},Y_{ik}) = \frac{\sigma^2_{b0}}{\sigma^2_{b0} + \sigma^2_{e}}\)

In words:

  • Average growth is linear with slope β1

  • Variances are constant over time (and equal to the sum of the variance of the intercept and the error)

  • Correlation between different observations from the same individuals is constant (exchangeable correlation structure)

R code

library(lme4)

random_intecept_model <- lmer(y ~ time + (1 | id),data = data, subset = (sex == "Male"))

summary(random_intecept_model)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ time + (1 | id)
   Data: data
 Subset: (sex == "Male")

REML criterion at convergence: 273.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.00805 -0.64069  0.00783  0.53448  3.05295 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 2.641    1.625   
 Residual             2.816    1.678   
Number of obs: 64, groups:  id, 16

Fixed effects:
            Estimate Std. Error t value
(Intercept) 22.61563    0.53690  42.123
time         0.78438    0.09382   8.361

Correlation of Fixed Effects:
     (Intr)
time -0.524

Plotting how random intercept model fit the data

data |> 
  filter(sex == "Male") |> 
  mutate(fitted = predict(random_intecept_model)) |> 
  ggplot(aes(x = time, y = fitted))+
  geom_line(aes(group = id))+
  geom_point(aes(y = value))+
  labs(x = "Years since age 8", y = "Distance")+
  theme_minimal()

Summarise the output

library(gtsummary)
random_intecept_model |> 
    tbl_regression(
      intercept = TRUE,
      tidy_fun = broom.mixed::tidy,
      estimate_fun = purrr::partial(style_ratio, digits = 3)
    )
Characteristic Beta 95% CI
(Intercept) 22.62 21.56, 23.67
time 0.784 0.601, 0.968
id.sd__(Intercept) 1.625
Residual.sd__Observation 1.678
Abbreviation: CI = Confidence Interval

Model output:

  • \(\beta_0 = 22.62\)

  • \(\beta_1 = 0.784\)

  • \(b_{0i} \sim N(0,\sigma^2_{b0}), \ SD: \sigma_{b0} = 1.625, \ \sigma^2_{b0} = 2.641\)

  • \(e_{ij} \sim N(0,\sigma^2_e), \ SD: \sigma_e = \ 1.678, \ \sigma^2_e = 2.816\)

3.3 Random intercepts and slopes model

Based on random intercepts models \(Y_{ij} = (\beta_0 + b_{0i}) + \beta_1 t_{ij} + e_{ij}\). We extend the slopes \(\beta_1 = \beta_1 + b_{1i}\):

  • Level 1: Describes variation within individuals

    • \(Y_{ij} = \beta_{0i} + \beta_{1i}t_{ij} + e_{ij}\)

    • Straight line (linear regression) for each subject

    • \(\beta_{0i}\) is individual-specific intercept

    • \(\beta_{1i}\) is the individual-specific slope

  • Level 2: Describes variation in intercepts AND slopes between individuals

    • Also allows correlation between them

    • \(\beta_{0i} = \beta_{00}+b_{0i}\), with \(b_{0i} \sim N(0,\sigma^2_{b0})\)

    • \(\beta_{1i} = \beta_{10}+b_{1i}\), with \(b_{0i} \sim N(0,\sigma^2_{b1})\)

    • \(Corr(b_{0i},b_{1i}) = \rho_b\)

Combined model: \(Y_{ij} = (\beta_0 + b_{0i}) + (\beta_1 + b_{1i})t_{ij} + e_{ij}, \ e_{ij} \sim N(0,\sigma^2_e)\)

R code

random_slope_intecept_model <- lmer(y ~ time + (1 + time | id),data = data, 
                                    subset = (sex == "Male"))

summary(random_slope_intecept_model)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ time + (1 + time | id)
   Data: data
 Subset: (sex == "Male")

REML criterion at convergence: 273.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.65543 -0.59382  0.01946  0.57552  3.15773 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 id       (Intercept) 3.04625  1.7454        
          time        0.03562  0.1887   -0.34
 Residual             2.58906  1.6091        
Number of obs: 64, groups:  id, 16

Fixed effects:
            Estimate Std. Error t value
(Intercept)  22.6156     0.5511  41.041
time          0.7844     0.1016   7.722

Correlation of Fixed Effects:
     (Intr)
time -0.558

Plotting how random intercept model fit the data

data |> 
  filter(sex == "Male") |> 
  mutate(fitted = predict(random_slope_intecept_model)) |> 
  ggplot(aes(x = time, y = fitted))+
  geom_line(aes(group = id))+
  geom_point(aes(y = value))+
  labs(x = "Years since age 8", y = "Distance")+
  theme_minimal()

Summarise the output

random_slope_intecept_model |> 
    tbl_regression(
      intercept = TRUE,
      tidy_fun = broom.mixed::tidy,
      estimate_fun = purrr::partial(style_ratio, digits = 3)
    )
Characteristic Beta 95% CI
(Intercept) 22.62 21.54, 23.70
time 0.784 0.585, 0.983
id.sd__(Intercept) 1.745
id.cor__(Intercept).time -0.339
id.sd__time 0.189
Residual.sd__Observation 1.609
Abbreviation: CI = Confidence Interval
NoteModel output and interpretation
  • \(\beta_0 = 22.62\): Average value at age 8 = 22.6

  • \(\beta_1 = 0.784\): Average rate of growth is 0.784 mm/ year

  • \(b_{0i} \sim N(0,\sigma^2_{b0}), \ SD: \sigma_{b0} = \ 1.745, \sigma^2_{b0} = 3.046\): Variations in intercepts between individuals

  • \(b_{1i} \sim N(0,\sigma^2_{b1}), \ SD: \sigma_{b1} = \ 0.189, \sigma^2_{b1} = 0.036\): Variations in slopes between individuals

  • \(Corr(b_{0i},b_{1i}) = \rho_b = -0.339\): Correlation between value at age 8 and later time points is -0.339

  • \(e_{ij} \sim N(0,\sigma^2_e), \ SD: \sigma_e = \ 1.609, \sigma^2_e = 2.589\)

3.4 Differences between groups

In the previous section, we only consider the male group, and focus on within cluster comparison (change over time). In this section, we will explore the following question:

  • Is average rate of growth the same for males and females?

  • What is the variability in rate of growth within each sex?

  • Is rate of growth correlated with size/distance at age 8?

We fit random intercepts and slopes model, and allow subject-specific intercept and slope to depend on sex

  • Level 1: Describes variation within individuals

    • \(Y_{ij} = \beta_{0i} + \beta_{1i}t_{ij} + e_{ij}\)
  • Level 2: Describes variation in intercepts and slopes between individuals

    • \(\beta_{0i} = \beta_{00} + b_{0i} + \beta_{01}Sex\)

    • \(\beta_{1i} = \beta_{10} + b_{1i} + \beta_{11}Sex\)

Combined model: \(Y_{ij} = (\beta_{00} + b_{0i} + \beta_{01}Sex) + (\beta_{10} + b_{1i} + \beta_{11}Sex)t_{ij} + e_{ij}\)

R code

random_slope_intecept_model2 <- lmer(y ~ sex*time + (1 + time | id),data = data)

summary(random_slope_intecept_model2)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ sex * time + (1 + time | id)
   Data: data

REML criterion at convergence: 432.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1681 -0.3859  0.0071  0.4452  3.8495 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 id       (Intercept) 3.23394  1.7983        
          time        0.03252  0.1803   -0.09
 Residual             1.71621  1.3100        
Number of obs: 108, groups:  id, 27

Fixed effects:
             Estimate Std. Error t value
(Intercept)   21.2091     0.6350  33.401
sexMale        1.4065     0.8249   1.705
time           0.4795     0.1037   4.623
sexMale:time   0.3048     0.1347   2.262

Correlation of Fixed Effects:
            (Intr) sexMal time  
sexMale     -0.770              
time        -0.396  0.305       
sexMale:tim  0.305 -0.396 -0.770

Plotting

data |> 
  mutate(fitted = predict(random_slope_intecept_model2)) |> 
  ggplot(aes(x = time, y = fitted))+
  geom_line(aes(group = id))+
  geom_point(aes(y = value))+
  facet_wrap(~sex)+
  labs(x = "Years since age 8", y = "Distance")+
  theme_bw()

Summary table

random_slope_intecept_model2 |> 
    tbl_regression(
      intercept = TRUE,
      tidy_fun = broom.mixed::tidy,
      estimate_fun = purrr::partial(style_ratio, digits = 3)
    )
Characteristic Beta 95% CI
(Intercept) 21.21 19.96, 22.45
sex

    Female
    Male 1.407 -0.210, 3.023
time 0.480 0.276, 0.683
sex * time

    Male * time 0.305 0.041, 0.569
id.sd__(Intercept) 1.798
id.cor__(Intercept).time -0.091
id.sd__time 0.180
Residual.sd__Observation 1.310
Abbreviation: CI = Confidence Interval

Model

\[Y_{ij} = (\beta_{00} + b_{0i} + \beta_{01}Sex) + (\beta_{10} + b_{1i} + \beta_{11}Sex)t_{ij} + e_{ij}\]

NoteModel output and interpretation
  • \(\beta_{11} = 0.305\): Different in growth rate of Male vs Female

  • \(\beta_{00} + \beta_{10} = 0.48\): Average growth rates when sex = 0 (Female): 0.48, Male (0.48+0.3) = 0.78

  • \(b_{1i} \sim N(0,\sigma^2_{b1}), \ SD: \sigma_{b1} = \ 0.18, \sigma^2_{b1} = 0.03\):

    • 95% of Female have growth rate 0.48 +/- 1.96*0.18 = 0.13 to 0.83

    • 95% of Male have growth rate 0.78 +/- 1.96*0.18 = 0.43 to 1.13

  • \(Corr(b_{0i},b_{1i}) = \rho_b = -0.09\): Little correlation between initial size at age 8 and growth rate between ages 8-14 years (assumes common correlation for males and females)

data |> 
  mutate(fitted = predict(random_slope_intecept_model2),
         pred.pop = predict(
           random_slope_intecept_model2,
           re.form = NA   
         )) |> 
  ggplot(aes(x = time))+
  geom_line(aes(y = pred.pop,group = id),size = 1)+
  geom_line(aes(y = value,group = id),alpha = .5)+
  geom_point(aes(y = value))+
  facet_wrap(~sex)+
  labs(x = "Years since age 8", y = "Distance")+
  theme_bw()

The lines that connected the points are the raw data. The bold lines are the outcome of the model