Regression for t-test and ANOVA with unequal variances

Welch’s t-test or Welch’s ANOVA, used in the case of heteroskedasticity (unequal variances), is “equivalent” to a likelihood ratio test (LRT) using a generalized least squares (GLS) model.

Code

library(tidyverse)
library(rstatix)
library(nlme)

Simulate heteroskedastic data from three groups

Code

set.seed(0)
n = 40
groups = c("A", "B", "C")
means  = c(0, 0, 2)
sds    = c(1, 3, 2)
x = sample(groups, n, replace = T)
r = rnorm(n, 0, sds[match(x, groups)])
y = means[match(x, groups)] + r
df = data.frame(y = y, 
                group = x)
df %>%
  ggplot(aes(x = group, y = y)) +
  geom_point(position = position_jitter(width = 0.25, height = 0)) +
  stat_summary(fun = mean, color = "red", size = 1.5) +
  theme_bw()

Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_segment()`).

ANOVA and LM

First, let’s run ANOVA and an F-test for a LM, both assuming equal variances. These give the same exact p-values (and we can show they are mathematically equivalent).

Code

oneway.test(y ~ group, data = df, var.equal = TRUE)


    One-way analysis of means

data:  y and group
F = 2.4555, num df = 2, denom df = 37, p-value = 0.09969

Code

anova(lm(y ~ group, data = df))

Analysis of Variance Table

Response: y
          Df  Sum Sq Mean Sq F value  Pr(>F)  
group      2  18.272  9.1358  2.4555 0.09969 .
Residuals 37 137.659  3.7205                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Welch’s ANOVA and GLS

Here we run the same comparison, but assuming unequal variances. The GLS model allows us to estimate weights for each for each group, corresponding to the different variances. Note, the GLS is fit using REML by default. This shows results when fitting by ML.

Code

oneway.test(y ~ group, data = df, var.equal = FALSE)


    One-way analysis of means (not assuming equal variances)

data:  y and group
F = 3.6425, num df = 2.000, denom df = 16.451, p-value = 0.04902

Code

anova(gls(y ~ group, data = df, weights = varIdent(form = ~ 1|group)))

Denom. DF: 37 
            numDF  F-value p-value
(Intercept)     1 1.302945  0.2610
group           2 3.790149  0.0318

Code

anova(gls(y ~ group, data = df, weights = varIdent(form = ~ 1|group), method = "ML"))

Denom. DF: 37 
            numDF  F-value p-value
(Intercept)     1 1.351793  0.2524
group           2 3.838852  0.0306

Kruskal-Wallis rank sum test and LM on ranks

Finally, what if we use a nonparametric test? Well, fitting a LM on the ranks of y gives similar results to the Kruskal-Wallis rank sum test.

Code

kruskal.test(y ~ group, data = df)


    Kruskal-Wallis rank sum test

data:  y by group
Kruskal-Wallis chi-squared = 5.2532, df = 2, p-value = 0.07232

Code

anova(lm(I(rank(y)) ~ group, data = df))

Analysis of Variance Table

Response: I(rank(y))
          Df Sum Sq Mean Sq F value Pr(>F)  
group      2  717.9  358.97  2.8798 0.0688 .
Residuals 37 4612.1  124.65                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

https://lindeloev.github.io/tests-as-linear/

Simulate heteroskedastic data from three groups

ANOVA and LM

Welch’s ANOVA and GLS

Kruskal-Wallis rank sum test and LM on ranks

Read more