Information criteria for MAIHDA models

Reports the relative-fit information criteria for one or more MAIHDA models, to help choose between model structures (different covariate sets, strata definitions, or families) – a question the VPC/ICC and PCV do not address. The criteria reported depend on the engine: AIC and BIC for the likelihood engines (lme4, and ordinal::clmm), and the Bayesian WAIC and LOOIC (leave-one-out information criterion) for brms. Lower is better for all four.

Usage

maihda_ic(..., model_names = NULL)

Arguments

...: One or more maihda_model objects (from fit_maihda) or maihda_analysis objects (from maihda). A maihda_analysis contributes its null model and, when present, its adjusted model as separate rows.
model_names: Optional character vector of names, one per ... argument. A maihda_analysis argument's null/adjusted rows are suffixed from its name.

Value

A data.frame of class maihda_ic with one row per model and the columns that apply: model, n (analytic sample size), estimator, df (number of parameters; likelihood engines), logLik, AIC, BIC (likelihood engines), WAIC, LOOIC (brms), and – when more than one model is supplied – delta (the difference from the best model on the primary criterion: AIC for the likelihood engines, LOOIC for brms). Columns that are entirely NA across the supplied models are dropped.

Details

REML vs ML. lmer fits Gaussian models by REML by default, and a REML log-likelihood (hence its AIC/BIC) is not comparable across models with different fixed effects – exactly the canonical MAIHDA null-vs-adjusted comparison. When more than one model is supplied, maihda_ic() therefore refits any REML lmer model with maximum likelihood (refitML) before computing AIC/BIC, matching the behaviour of anova() on lme4 models; the estimator column records when this happened. For a single model the criterion is reported as fitted (the estimator column then reads "REML").

Comparability. Like the VPC, information criteria are only comparable across models fitted to the same analytic sample (same rows and outcome). AIC/BIC additionally require the same response distribution – they are not comparable across families (e.g. a Gaussian vs a Poisson fit), nor between the likelihood engines and brms (AIC/BIC vs WAIC/LOOIC are different scales). maihda_ic() does not enforce this; compare_maihda warns when the supplied models differ in outcome, sample, or family.

Design-weighted fits. For the wemix (design-weighted) engine the criteria are reported as NA: a pseudo-likelihood with sampling weights does not define a standard AIC/BIC.

Examples

# \donttest{
strata <- make_strata(maihda_sim_data, vars = c("gender", "race"))
null_model <- fit_maihda(health_outcome ~ 1 + (1 | stratum), data = strata$data)
adj_model  <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata$data)

# AIC/BIC for two nested structures (REML lmer fits are ML-refitted first)
maihda_ic(null_model, adj_model, model_names = c("Null", "Adjusted"))
#> MAIHDA Information Criteria
#> ===========================
#> 
#>     model   n            estimator df logLik  AIC  BIC delta
#>      Null 500 ML (refit from REML)  3  -1918 3843 3855 41.68
#>  Adjusted 500 ML (refit from REML)  4  -1897 3801 3818  0.00
#> 
#> delta = difference from the best model on AIC (lower is better).
#> REML lmer fit(s) were refitted with ML so AIC/BIC are comparable across different fixed effects.
#> Information criteria are only comparable across models fitted to the same analytic sample (and, for AIC/BIC, the same family).
#> 

# Or straight from a one-call maihda() analysis (null + adjusted rows)
a <- maihda(health_outcome ~ age + gender + race + (1 | gender:race),
            data = maihda_sim_data)
maihda_ic(a)
#> MAIHDA Information Criteria
#> ===========================
#> 
#>              model   n            estimator df logLik  AIC  BIC delta
#>      Model1 (Null) 500 ML (refit from REML)  4  -1897 3801 3818 11.36
#>  Model1 (Adjusted) 500 ML (refit from REML)  8  -1887 3790 3823  0.00
#> 
#> delta = difference from the best model on AIC (lower is better).
#> REML lmer fit(s) were refitted with ML so AIC/BIC are comparable across different fixed effects.
#> Information criteria are only comparable across models fitted to the same analytic sample (and, for AIC/BIC, the same family).
#> 
# }